So, with the recent noise I've been making about the collections API,
I figured I should put some thoughts down on screen.
As I see it one of the main problems with the current collection API
is that in order to make things play nicely you need to override
methods at every step of the way. map, filter, flatMap etc all return
the "wrong" things. This is a pain to write and easy to get wrong.
The other thing I don't like about the collections API is that I'm not
a big fan of iterators. They're really cumbersome to create for custom
collection types and are designed around a really irritating "check
then proceed" style. Really all we care about is how to traverse the
data structure, and constructing an iterator to do so is often not the
best way.
Here's something I threw together last night to try to address these
problems. The basic principles are:
- It should be easy to convert between collection types
- Consequently it's ok if we return the "wrong" thing.
- In fact it may be desirable to do so!
- All we care about generically are how to traverse a data structure
and how to build one, so these are the operations the API should be
designed around.
So the API defines a trait Enumerable which describes how to traverse
a structure and a trait Buildable which describes how to build one
(the data structure doesn't implement Buildable, one makes an implicit
instance available). Implementations do not override the enumerable
map, flatMap, etc. methods - these continue to return a generic
enumerable. However at the end of a bunch of operations you explicitly
"cast" to the desired result.
So for example
scala> for (i <- range(1, 4); j <- range(1, 4); if (i != j)) yield (i, j)
res8: enumerable.Enumerable[(Int, Int)] = enumerable.Enumerable$$anon$4@708aff
scala> res8.intersperse(",").mkString
res9: String = (1,2),(1,3),(2,1),(2,3),(3,1),(3,2)
scala> res8.as[List, (Int, Int)]
res10: List[(Int, Int)] = List((1,2), (1,3), (2,1), (2,3), (3,1), (3,2))
(you need to specify the type like that in order to make this work
because of Enumerable being covariant in its type)
I've not benchmarked, but for large for comprehensions this should
actually be more efficient, as it avoids the construction of
intermediate datastructures.
We can use return from inside passed closures to control evaluation
and exit the loop prematurely, which allows efficient implementation
of things like exists, foreach and take. For example:
scala> def from(x : Int) = new Enumerable[Int] { def foreach(f : Int
=> Unit) = {var i = x; while(true) { f(i); i =} } } i + 1} } }
from: (Int)java.lang.Object with enumerable.Enumerable[Int]
scala> from(0).take(10).foreach(println);
0
1
2
3
4
5
6
7
8
9
The exact structure of Buildable needs work (it doesn't currently
handle sorted collections well for example), but I'm reaonably happy
with how Enumerable works.
Any thoughts?