« Return to Thread: Towards a revised collection API

Re: Towards a revised collection API

by Christos KK Loverdos :: Rate this Message:

Reply to Author | View in Thread

Wouldn't it be to our benefit to analyze a little the basic components
(traits, classes) of the Scala collection library and then see what we
can do to make it better? I currently do not have the whole picture
and I think there are a lot of people out there that would like to
know the rationale behind, i.e. Projection, RandomAccessSeq, Iterator
etc and the role they play in the design and implementation of the
library.

It is my opinion that only after we pin down these things, we may be
able to contribute something better.

On Sat, Apr 19, 2008 at 2:26 PM, David MacIver <david.maciver@...> wrote:

> On Sat, Apr 19, 2008 at 3:09 AM, Jamie Webb <j@...> wrote:
>  > On 2008-04-18 13:28:48 David MacIver wrote:
>  >  > On Fri, Apr 18, 2008 at 1:13 PM, Jamie Webb <j@...>
>  >  > wrote:
>  >  > > On 2008-04-18 12:28:38 David MacIver wrote:
>  >  > >  > So, with the recent noise I've been making about the collections
>  >  > >  > API, I figured I should put some thoughts down on screen.
>  >  > >  >
>  >  > >  > As I see it one of the main problems with the current collection
>  >  > >  > API is that in order to make things play nicely you need to
>  >  > >  > override methods at every step of the way. map, filter, flatMap
>  >  > >  > etc all return the "wrong" things. This is a pain to write and
>  >  > >  > easy to get wrong.
>  >  > >
>  >  > >  This is what Martin is talking about fixing with type constructor
>  >  > >  polymorphism, which seems preferable by far.
>  >  >
>  >  > Sure. This solution uses type constructor polymorphism too. But
>  >  > getting it to handle returning the right type automatically without
>  >  > having to constantly reimplement isn't totally trivial.
>  >
>  >  The collections hierarchy is such a core part of the library that I
>  >  don't consider the implementation cost to be anywhere near as important
>  >  as usability or performance. From what I can see, your proposal falls
>  >  down on both those fronts.
>
>  I don't think you've made a convincing argument that performance would
>  suffer. The only issue raised so far is that nonlocal return is slower
>  than it should be, but that's trivial to fix in the compiler/standard
>  library and not so hard to fix outside the standard library with a bit
>  of custom use of exceptions (in actual fact, some benchmarking
>  suggests that it might even be desireable to do so). Indeed, internal
>  iteration is often faster than external iteration, and avoids a lot of
>  intermediate structures so reduces object churn. I wouldn't be
>  surprised if this was a slight performance gain. Equally, I wouldn't
>  be surprised if it was a slight performance loss. I just don't think
>  it's as clear cut as you're making it out to be. No way to tell other
>  than to benchmark really.
>
>  In terms of usability, that needs some work, but I don't find this to
>  be particularly unusable as it stands.
>
>  In terms of it being a core part of the library, the cost of
>  implementation actually is a big deal. Scala currently lacks much in
>  the way of a rich set of functional data structures. This isn't a huge
>  issue - certainly I don't expect the EFPL team to have provided us
>  with such - but the easier we can make it to implement new ones the
>  more likely it is that we'll get such.
>
>
>  >  > My point is you can reimplement basically the entire Iterable trait
>  >  > sans "elements" based on only foreach.
>  >
>  >  You can, and I think it would be good to have a superclass of Iterable
>  >  that does this. However, below that there need to be classes which
>  >  implement these methods in a way that best suits their access
>  >  characteristics. The primary example of this at present is
>  >  RandomAccessSeq, but the same is true of all the variations to some
>  >  degree.
>
>  Well, sure. Subclasses are still welcome to override them to replace
>  them with better versions - it would be madness to make these methods
>  final. The idea is to make it so they do the right thing by default
>  without forcing an override.
>
>  Although, actually, I can't think of any reason you'd want to
>  reimplement these for RandomAccessSeq.
>
>
>  >  > >  Only if we assume that all our operations are non-strict, which I
>  >  > > don't think is a good thing.
>  >  >
>  >  > I think this is ok. The useage I have in mind is that you do a bunch
>  >  > of calculations with a "suspended" result and then immediately thunk
>  >  > the whole thing.
>  >
>  >  So I'd have to write things like this?
>  >
>  >   myNumbers.map(_ * 2).as[List, Int]
>  >
>  >  That's... ugly.
>
>  It can certainly be improved. For example, I think it could be reduced
>  to myNumbers.map(_ * 2) : List[Int] now that the contractiveness
>  condition has been removed, but that's a rather abusive use of
>  implicit conversions. I need to have a play with it and see if I can
>  come up with something in between.
>
>  Even without that, I can definitely reduce it to as[List[Int]], which
>  I find a bit better. I just ran into some last minute trouble with the
>  way Buildable worked and the fact that Enumerable was covariant, so
>  ended up with a slightly ugly signature there.
>
>  And the thing is, yes. This is a bit uglier if you happen to have the
>  right data structure. But even if map, flatMap, etc. all return the
>  "right" thing (which they can't always do. SortedSet for example would
>  require a different type signature for them if they were to return the
>  right type of data structure) you often find you started with a List
>  and now you want a Map. And with this you could equally do:
>
>  myNumbers.map(_ * 2).as[FingerTree[Int]]
>
>  or
>
>  myNumbers.map(x => (x, x)).as[Map[Int, Int]]
>
>  which I find a good deal more pleasant than the current state of play.
>
>  I agree this makes the mainline case a little more awkward, and I need
>  to have a think about how to improve that, but it makes a bunch of
>  other cases much less so.
>
>
>  >
>  >  > I'm not saying this will never be slower. But equally I expect there
>  >  > are a lot of times where it will be faster, particularly when it can
>  >  > eliminate a bunch of intermediate structures.
>  >
>  >  Bear in mind that the library already offers Projections which do this,
>  >  so your version will not be faster.
>
>  Projection works by creating custom iterators to do these things.
>  There's a moderate amount more book keeping to be done to make this
>  work with iterators, and they still implement their methods in terms
>  of those iterators. Particularly for very nested structures like trees
>  this is potentially a non-trivial amount of work, which might make my
>  version faster. That being said, I would be amazed if my proposal
>  ended up substantially faster than using projections all over the
>  place, but hopefully it would be a good deal more convenient. :-)
>
>
>  >  It seems like your Enumerable and Buildable concepts are perfectly
>  >  valuable things to have as part of the library, but there's no way you
>
>  I'd certainly be ok with, for example, offering Iterable as a subtrait
>  of Enumerable and otherwise carrying on as normal. If nothing else it
>  would smooth the transition if (and when ;-) ) it did turn out that I
>  was right, and sometimes it really is convenient to have an iterator
>  (for example, it's really hard to zip two enumerables together).
>
>
>  >  could use them as the basis for a practical collections hierarchy. The
>  >  same is likely true of any other simple solution. The problem is not
>  >  simple.
>
>  Don't agree. Most problems are simple once you get down to the core.
>  But I don't think there's any way for me to convince you of this other
>  than to actually do it. :-)
>
>
>
>  >  I think the collections library needs tidying up and rationalising,
>  >  and updating to make use of current language features. I'd like to see
>  >  it /support/ collections that lack iterators. But I don't see the basic
>  >  design changing much.
>
>



--
 __~O
 -\ <, Christos KK Loverdos
(*)/ (*) http://ckkloverdos.com

 « Return to Thread: Towards a revised collection API