On Sat, Apr 19, 2008 at 3:09 AM, Jamie Webb <
j@...> wrote:
> On 2008-04-18 13:28:48 David MacIver wrote:
> > On Fri, Apr 18, 2008 at 1:13 PM, Jamie Webb <
j@...>
> > wrote:
> > > On 2008-04-18 12:28:38 David MacIver wrote:
> > > > So, with the recent noise I've been making about the collections
> > > > API, I figured I should put some thoughts down on screen.
> > > >
> > > > As I see it one of the main problems with the current collection
> > > > API is that in order to make things play nicely you need to
> > > > override methods at every step of the way. map, filter, flatMap
> > > > etc all return the "wrong" things. This is a pain to write and
> > > > easy to get wrong.
> > >
> > > This is what Martin is talking about fixing with type constructor
> > > polymorphism, which seems preferable by far.
> >
> > Sure. This solution uses type constructor polymorphism too. But
> > getting it to handle returning the right type automatically without
> > having to constantly reimplement isn't totally trivial.
>
> The collections hierarchy is such a core part of the library that I
> don't consider the implementation cost to be anywhere near as important
> as usability or performance. From what I can see, your proposal falls
> down on both those fronts.
I don't think you've made a convincing argument that performance would
suffer. The only issue raised so far is that nonlocal return is slower
than it should be, but that's trivial to fix in the compiler/standard
library and not so hard to fix outside the standard library with a bit
of custom use of exceptions (in actual fact, some benchmarking
suggests that it might even be desireable to do so). Indeed, internal
iteration is often faster than external iteration, and avoids a lot of
intermediate structures so reduces object churn. I wouldn't be
surprised if this was a slight performance gain. Equally, I wouldn't
be surprised if it was a slight performance loss. I just don't think
it's as clear cut as you're making it out to be. No way to tell other
than to benchmark really.
In terms of usability, that needs some work, but I don't find this to
be particularly unusable as it stands.
In terms of it being a core part of the library, the cost of
implementation actually is a big deal. Scala currently lacks much in
the way of a rich set of functional data structures. This isn't a huge
issue - certainly I don't expect the EFPL team to have provided us
with such - but the easier we can make it to implement new ones the
more likely it is that we'll get such.
> > My point is you can reimplement basically the entire Iterable trait
> > sans "elements" based on only foreach.
>
> You can, and I think it would be good to have a superclass of Iterable
> that does this. However, below that there need to be classes which
> implement these methods in a way that best suits their access
> characteristics. The primary example of this at present is
> RandomAccessSeq, but the same is true of all the variations to some
> degree.
Well, sure. Subclasses are still welcome to override them to replace
them with better versions - it would be madness to make these methods
final. The idea is to make it so they do the right thing by default
without forcing an override.
Although, actually, I can't think of any reason you'd want to
reimplement these for RandomAccessSeq.
> > > Only if we assume that all our operations are non-strict, which I
> > > don't think is a good thing.
> >
> > I think this is ok. The useage I have in mind is that you do a bunch
> > of calculations with a "suspended" result and then immediately thunk
> > the whole thing.
>
> So I'd have to write things like this?
>
> myNumbers.map(_ * 2).as[List, Int]
>
> That's... ugly.
It can certainly be improved. For example, I think it could be reduced
to myNumbers.map(_ * 2) : List[Int] now that the contractiveness
condition has been removed, but that's a rather abusive use of
implicit conversions. I need to have a play with it and see if I can
come up with something in between.
Even without that, I can definitely reduce it to as[List[Int]], which
I find a bit better. I just ran into some last minute trouble with the
way Buildable worked and the fact that Enumerable was covariant, so
ended up with a slightly ugly signature there.
And the thing is, yes. This is a bit uglier if you happen to have the
right data structure. But even if map, flatMap, etc. all return the
"right" thing (which they can't always do. SortedSet for example would
require a different type signature for them if they were to return the
right type of data structure) you often find you started with a List
and now you want a Map. And with this you could equally do:
myNumbers.map(_ * 2).as[FingerTree[Int]]
or
myNumbers.map(x => (x, x)).as[Map[Int, Int]]
which I find a good deal more pleasant than the current state of play.
I agree this makes the mainline case a little more awkward, and I need
to have a think about how to improve that, but it makes a bunch of
other cases much less so.
>
> > I'm not saying this will never be slower. But equally I expect there
> > are a lot of times where it will be faster, particularly when it can
> > eliminate a bunch of intermediate structures.
>
> Bear in mind that the library already offers Projections which do this,
> so your version will not be faster.
Projection works by creating custom iterators to do these things.
There's a moderate amount more book keeping to be done to make this
work with iterators, and they still implement their methods in terms
of those iterators. Particularly for very nested structures like trees
this is potentially a non-trivial amount of work, which might make my
version faster. That being said, I would be amazed if my proposal
ended up substantially faster than using projections all over the
place, but hopefully it would be a good deal more convenient. :-)
> It seems like your Enumerable and Buildable concepts are perfectly
> valuable things to have as part of the library, but there's no way you
I'd certainly be ok with, for example, offering Iterable as a subtrait
of Enumerable and otherwise carrying on as normal. If nothing else it
would smooth the transition if (and when ;-) ) it did turn out that I
was right, and sometimes it really is convenient to have an iterator
(for example, it's really hard to zip two enumerables together).
> could use them as the basis for a practical collections hierarchy. The
> same is likely true of any other simple solution. The problem is not
> simple.
Don't agree. Most problems are simple once you get down to the core.
But I don't think there's any way for me to convince you of this other
than to actually do it. :-)
> I think the collections library needs tidying up and rationalising,
> and updating to make use of current language features. I'd like to see
> it /support/ collections that lack iterators. But I don't see the basic
> design changing much.