|
View:
New views
20 Messages
—
Rating Filter:
Alert me
|
| < Prev | 1 - 2 | Next > |
|
|
Towards a revised collection APISo, with the recent noise I've been making about the collections API,
I figured I should put some thoughts down on screen. As I see it one of the main problems with the current collection API is that in order to make things play nicely you need to override methods at every step of the way. map, filter, flatMap etc all return the "wrong" things. This is a pain to write and easy to get wrong. The other thing I don't like about the collections API is that I'm not a big fan of iterators. They're really cumbersome to create for custom collection types and are designed around a really irritating "check then proceed" style. Really all we care about is how to traverse the data structure, and constructing an iterator to do so is often not the best way. Here's something I threw together last night to try to address these problems. The basic principles are: - It should be easy to convert between collection types - Consequently it's ok if we return the "wrong" thing. - In fact it may be desirable to do so! - All we care about generically are how to traverse a data structure and how to build one, so these are the operations the API should be designed around. So the API defines a trait Enumerable which describes how to traverse a structure and a trait Buildable which describes how to build one (the data structure doesn't implement Buildable, one makes an implicit instance available). Implementations do not override the enumerable map, flatMap, etc. methods - these continue to return a generic enumerable. However at the end of a bunch of operations you explicitly "cast" to the desired result. So for example scala> for (i <- range(1, 4); j <- range(1, 4); if (i != j)) yield (i, j) res8: enumerable.Enumerable[(Int, Int)] = enumerable.Enumerable$$anon$4@708aff scala> res8.intersperse(",").mkString res9: String = (1,2),(1,3),(2,1),(2,3),(3,1),(3,2) scala> res8.as[List, (Int, Int)] res10: List[(Int, Int)] = List((1,2), (1,3), (2,1), (2,3), (3,1), (3,2)) (you need to specify the type like that in order to make this work because of Enumerable being covariant in its type) I've not benchmarked, but for large for comprehensions this should actually be more efficient, as it avoids the construction of intermediate datastructures. We can use return from inside passed closures to control evaluation and exit the loop prematurely, which allows efficient implementation of things like exists, foreach and take. For example: scala> def from(x : Int) = new Enumerable[Int] { def foreach(f : Int => Unit) = {var i = x; while(true) { f(i); i =} } } i + 1} } } from: (Int)java.lang.Object with enumerable.Enumerable[Int] scala> from(0).take(10).foreach(println); 0 1 2 3 4 5 6 7 8 9 The exact structure of Buildable needs work (it doesn't currently handle sorted collections well for example), but I'm reaonably happy with how Enumerable works. Any thoughts? |
|
|
Re: Towards a revised collection APIHave you seen the paper "Generics of a Higher Kind"?
http://lambda-the-ultimate.org/node/2579 They introduce a Buildable trait that does the job. Moreover, Martin has just said in scala-debate that they are redesigning the collections in order to use the ideas in the paper, and they do so in a backwards-compatible way. It will take them a couple of months before they go public. On Fri, Apr 18, 2008 at 2:28 PM, David MacIver <david.maciver@...> wrote: > So, with the recent noise I've been making about the collections API, > I figured I should put some thoughts down on screen. > > As I see it one of the main problems with the current collection API > is that in order to make things play nicely you need to override > methods at every step of the way. map, filter, flatMap etc all return > the "wrong" things. This is a pain to write and easy to get wrong. > > The other thing I don't like about the collections API is that I'm not > a big fan of iterators. They're really cumbersome to create for custom > collection types and are designed around a really irritating "check > then proceed" style. Really all we care about is how to traverse the > data structure, and constructing an iterator to do so is often not the > best way. > > Here's something I threw together last night to try to address these > problems. The basic principles are: > > - It should be easy to convert between collection types > - Consequently it's ok if we return the "wrong" thing. > - In fact it may be desirable to do so! > - All we care about generically are how to traverse a data structure > and how to build one, so these are the operations the API should be > designed around. > > So the API defines a trait Enumerable which describes how to traverse > a structure and a trait Buildable which describes how to build one > (the data structure doesn't implement Buildable, one makes an implicit > instance available). Implementations do not override the enumerable > map, flatMap, etc. methods - these continue to return a generic > enumerable. However at the end of a bunch of operations you explicitly > "cast" to the desired result. > > So for example > > scala> for (i <- range(1, 4); j <- range(1, 4); if (i != j)) yield (i, j) > res8: enumerable.Enumerable[(Int, Int)] = enumerable.Enumerable$$anon$4@708aff > > scala> res8.intersperse(",").mkString > res9: String = (1,2),(1,3),(2,1),(2,3),(3,1),(3,2) > > scala> res8.as[List, (Int, Int)] > res10: List[(Int, Int)] = List((1,2), (1,3), (2,1), (2,3), (3,1), (3,2)) > > (you need to specify the type like that in order to make this work > because of Enumerable being covariant in its type) > > I've not benchmarked, but for large for comprehensions this should > actually be more efficient, as it avoids the construction of > intermediate datastructures. > > We can use return from inside passed closures to control evaluation > and exit the loop prematurely, which allows efficient implementation > of things like exists, foreach and take. For example: > > scala> def from(x : Int) = new Enumerable[Int] { def foreach(f : Int > => Unit) = {var i = x; while(true) { f(i); i =} } } i + 1} } } > from: (Int)java.lang.Object with enumerable.Enumerable[Int] > > scala> from(0).take(10).foreach(println); > 0 > 1 > 2 > 3 > 4 > 5 > 6 > 7 > 8 > 9 > > The exact structure of Buildable needs work (it doesn't currently > handle sorted collections well for example), but I'm reaonably happy > with how Enumerable works. > > Any thoughts? > -- __~O -\ <, Christos KK Loverdos (*)/ (*) http://ckkloverdos.com |
|
|
Re: Towards a revised collection APIOn Fri, Apr 18, 2008 at 12:39 PM, Christos KK Loverdos
<loverdos@...> wrote: > Have you seen the paper "Generics of a Higher Kind"? > > http://lambda-the-ultimate.org/node/2579 Yes, the Buildable trait is based on that. Obviously. :-) The more interesting part of this is the dual of that, the Enumerable trait. > They introduce a Buildable trait that does the job. Moreover, Martin > has just said in scala-debate that they are redesigning the > collections in order to use the ideas in the paper, and they do so in > a backwards-compatible way. It will take them a couple of months > before they go public. Which is all the more reason to discuss this now rather than in a few months once the decision has been finalised... |
|
|
Re: Towards a revised collection APIOn 2008-04-18 12:28:38 David MacIver wrote:
> So, with the recent noise I've been making about the collections API, > I figured I should put some thoughts down on screen. > > As I see it one of the main problems with the current collection API > is that in order to make things play nicely you need to override > methods at every step of the way. map, filter, flatMap etc all return > the "wrong" things. This is a pain to write and easy to get wrong. This is what Martin is talking about fixing with type constructor polymorphism, which seems preferable by far. > The other thing I don't like about the collections API is that I'm not > a big fan of iterators. They're really cumbersome to create for custom > collection types and are designed around a really irritating "check > then proceed" style. Really all we care about is how to traverse the > data structure, and constructing an iterator to do so is often not the > best way. Agreed. See scalax.control.ManagedSequence for some code that gets badly bitten by this. However, aside from creating one or more superclasses of Iterable that define the more functional interfaces, it's hard to see how to improve. > Here's something I threw together last night to try to address these > problems. The basic principles are: > > - It should be easy to convert between collection types Fair enough. > - Consequently it's ok if we return the "wrong" thing. Only if we assume that all our operations are non-strict, which I don't think is a good thing. > - In fact it may be desirable to do so! > - All we care about generically are how to traverse a data structure > and how to build one, so these are the operations the API should be > designed around. Clearly, but that is already the case. > So the API defines a trait Enumerable which describes how to traverse > a structure and a trait Buildable which describes how to build one > (the data structure doesn't implement Buildable, one makes an implicit > instance available). Implementations do not override the enumerable > map, flatMap, etc. methods - these continue to return a generic > enumerable. However at the end of a bunch of operations you explicitly > "cast" to the desired result. Do you have some type signatures for these classes that you can show us? ... > We can use return from inside passed closures to control evaluation > and exit the loop prematurely, which allows efficient implementation > of things like exists, foreach and take. For example: Non-local returns are implemented via exceptions, so they are not really efficient. I think any move away from iterators is likely to imply a performance loss. /J |
|
|
Re: Towards a revised collection APIOn Fri, Apr 18, 2008 at 1:13 PM, Jamie Webb <j@...> wrote:
> On 2008-04-18 12:28:38 David MacIver wrote: > > So, with the recent noise I've been making about the collections API, > > I figured I should put some thoughts down on screen. > > > > As I see it one of the main problems with the current collection API > > is that in order to make things play nicely you need to override > > methods at every step of the way. map, filter, flatMap etc all return > > the "wrong" things. This is a pain to write and easy to get wrong. > > This is what Martin is talking about fixing with type constructor > polymorphism, which seems preferable by far. Sure. This solution uses type constructor polymorphism too. But getting it to handle returning the right type automatically without having to constantly reimplement isn't totally trivial. > > The other thing I don't like about the collections API is that I'm not > > a big fan of iterators. They're really cumbersome to create for custom > > collection types and are designed around a really irritating "check > > then proceed" style. Really all we care about is how to traverse the > > data structure, and constructing an iterator to do so is often not the > > best way. > > Agreed. See scalax.control.ManagedSequence for some code that gets > badly bitten by this. However, aside from creating one or more > superclasses of Iterable that define the more functional interfaces, > it's hard to see how to improve. Either the mailing list stripped the attached file, or you missed it. In the case that it's the former, here's a pastebin: http://pastebin.com/m23106284 My point is you can reimplement basically the entire Iterable trait sans "elements" based on only foreach. > > - Consequently it's ok if we return the "wrong" thing. > > Only if we assume that all our operations are non-strict, which I don't > think is a good thing. I think this is ok. The useage I have in mind is that you do a bunch of calculations with a "suspended" result and then immediately thunk the whole thing. > > - In fact it may be desirable to do so! > > - All we care about generically are how to traverse a data structure > > and how to build one, so these are the operations the API should be > > designed around. > > Clearly, but that is already the case. Not really. The API is poorly constructed for building, and the traversal strategies center around creating iterators. > > So the API defines a trait Enumerable which describes how to traverse > > a structure and a trait Buildable which describes how to build one > > (the data structure doesn't implement Buildable, one makes an implicit > > instance available). Implementations do not override the enumerable > > map, flatMap, etc. methods - these continue to return a generic > > enumerable. However at the end of a bunch of operations you explicitly > > "cast" to the desired result. > > Do you have some type signatures for these classes that you can show us? As before, see attached file / pastebin. > > We can use return from inside passed closures to control evaluation > > and exit the loop prematurely, which allows efficient implementation > > of things like exists, foreach and take. For example: > > Non-local returns are implemented via exceptions, so they are not > really efficient. I think any move away from iterators is likely to > imply a performance loss. I disagree. Exceptions are extremely fast on the JVM and have been for years. They basically compile to gotos. It's just stack trace generation which is slow, which you can surpress in your implementation of Throwable (which I assume Scala's non-local returns do. If not, I'd consider that a bug. I'll check later). I'm not saying this will never be slower. But equally I expect there are a lot of times where it will be faster, particularly when it can eliminate a bunch of intermediate structures. |
|
|
Re: Towards a revised collection APIOn Fri, Apr 18, 2008 at 1:28 PM, David MacIver <david.maciver@...> wrote:
> > Non-local returns are implemented via exceptions, so they are not > > really efficient. I think any move away from iterators is likely to > > imply a performance loss. > > I disagree. Exceptions are extremely fast on the JVM and have been for > years. They basically compile to gotos. It's just stack trace > generation which is slow, which you can surpress in your > implementation of Throwable (which I assume Scala's non-local returns > do. If not, I'd consider that a bug. I'll check later). Looks like it doesn't. I've filed a bug: https://lampsvn.epfl.ch/trac/scala/ticket/775 |
|
|
Re: Towards a revised collection APIOn 2008-04-18 13:53:20 David MacIver wrote:
> On Fri, Apr 18, 2008 at 1:28 PM, David MacIver > <david.maciver@...> wrote: > > > Non-local returns are implemented via exceptions, so they are > > > not really efficient. I think any move away from iterators is > > > likely to imply a performance loss. > > > > I disagree. Exceptions are extremely fast on the JVM and have been > > for years. They basically compile to gotos. It's just stack trace > > generation which is slow, which you can surpress in your > > implementation of Throwable (which I assume Scala's non-local > > returns do. If not, I'd consider that a bug. I'll check later). > > Looks like it doesn't. I've filed a bug: > https://lampsvn.epfl.ch/trac/scala/ticket/775 There's a reason: if a closure escapes its defining method then the exception will propagated as normal. This would be a pain to track down without a stack trace. /J |
|
|
Re: Towards a revised collection APIOn Fri, Apr 18, 2008 at 5:53 PM, Jamie Webb <j@...> wrote:
> > On 2008-04-18 13:53:20 David MacIver wrote: > > On Fri, Apr 18, 2008 at 1:28 PM, David MacIver > > <david.maciver@...> wrote: > > > > Non-local returns are implemented via exceptions, so they are > > > > not really efficient. I think any move away from iterators is > > > > likely to imply a performance loss. > > > > > > I disagree. Exceptions are extremely fast on the JVM and have been > > > for years. They basically compile to gotos. It's just stack trace > > > generation which is slow, which you can surpress in your > > > implementation of Throwable (which I assume Scala's non-local > > > returns do. If not, I'd consider that a bug. I'll check later). > > > > Looks like it doesn't. I've filed a bug: > > https://lampsvn.epfl.ch/trac/scala/ticket/775 > > There's a reason: if a closure escapes its defining method then the > exception will propagated as normal. This would be a pain to track down > without a stack trace. I don't buy that. It's a sufficiently rare error that breaking the performance of this feature so totally in an attempt to make debugging of the it slightly easier is really pointless. In any case, if this is really deemed to be a problem there are solutions of varying degrees of difficulty The easiest it to have a global debug parameter somewhere that can be set and have the stack trace be suppressed only if this isn't set. |
|
|
Re: Towards a revised collection APIOn 2008-04-18 13:28:48 David MacIver wrote:
> On Fri, Apr 18, 2008 at 1:13 PM, Jamie Webb <j@...> > wrote: > > On 2008-04-18 12:28:38 David MacIver wrote: > > > So, with the recent noise I've been making about the collections > > > API, I figured I should put some thoughts down on screen. > > > > > > As I see it one of the main problems with the current collection > > > API is that in order to make things play nicely you need to > > > override methods at every step of the way. map, filter, flatMap > > > etc all return the "wrong" things. This is a pain to write and > > > easy to get wrong. > > > > This is what Martin is talking about fixing with type constructor > > polymorphism, which seems preferable by far. > > Sure. This solution uses type constructor polymorphism too. But > getting it to handle returning the right type automatically without > having to constantly reimplement isn't totally trivial. The collections hierarchy is such a core part of the library that I don't consider the implementation cost to be anywhere near as important as usability or performance. From what I can see, your proposal falls down on both those fronts. > > > The other thing I don't like about the collections API is that > > > I'm not a big fan of iterators. They're really cumbersome to > > > create for custom collection types and are designed around a > > > really irritating "check then proceed" style. Really all we care > > > about is how to traverse the data structure, and constructing an > > > iterator to do so is often not the best way. > > > > Agreed. See scalax.control.ManagedSequence for some code that gets > > badly bitten by this. However, aside from creating one or more > > superclasses of Iterable that define the more functional > > interfaces, it's hard to see how to improve. > > Either the mailing list stripped the attached file, or you missed it. > In the case that it's the former, here's a pastebin: > http://pastebin.com/m23106284 Sorry, I missed it. > My point is you can reimplement basically the entire Iterable trait > sans "elements" based on only foreach. You can, and I think it would be good to have a superclass of Iterable that does this. However, below that there need to be classes which implement these methods in a way that best suits their access characteristics. The primary example of this at present is RandomAccessSeq, but the same is true of all the variations to some degree. > > > - Consequently it's ok if we return the "wrong" thing. > > > > Only if we assume that all our operations are non-strict, which I > > don't think is a good thing. > > I think this is ok. The useage I have in mind is that you do a bunch > of calculations with a "suspended" result and then immediately thunk > the whole thing. So I'd have to write things like this? myNumbers.map(_ * 2).as[List, Int] That's... ugly. > I'm not saying this will never be slower. But equally I expect there > are a lot of times where it will be faster, particularly when it can > eliminate a bunch of intermediate structures. Bear in mind that the library already offers Projections which do this, so your version will not be faster. It seems like your Enumerable and Buildable concepts are perfectly valuable things to have as part of the library, but there's no way you could use them as the basis for a practical collections hierarchy. The same is likely true of any other simple solution. The problem is not simple. I think the collections library needs tidying up and rationalising, and updating to make use of current language features. I'd like to see it /support/ collections that lack iterators. But I don't see the basic design changing much. /J |
|
|
Re: Towards a revised collection APIOn Sat, Apr 19, 2008 at 3:09 AM, Jamie Webb <j@...> wrote:
> On 2008-04-18 13:28:48 David MacIver wrote: > > On Fri, Apr 18, 2008 at 1:13 PM, Jamie Webb <j@...> > > wrote: > > > On 2008-04-18 12:28:38 David MacIver wrote: > > > > So, with the recent noise I've been making about the collections > > > > API, I figured I should put some thoughts down on screen. > > > > > > > > As I see it one of the main problems with the current collection > > > > API is that in order to make things play nicely you need to > > > > override methods at every step of the way. map, filter, flatMap > > > > etc all return the "wrong" things. This is a pain to write and > > > > easy to get wrong. > > > > > > This is what Martin is talking about fixing with type constructor > > > polymorphism, which seems preferable by far. > > > > Sure. This solution uses type constructor polymorphism too. But > > getting it to handle returning the right type automatically without > > having to constantly reimplement isn't totally trivial. > > The collections hierarchy is such a core part of the library that I > don't consider the implementation cost to be anywhere near as important > as usability or performance. From what I can see, your proposal falls > down on both those fronts. I don't think you've made a convincing argument that performance would suffer. The only issue raised so far is that nonlocal return is slower than it should be, but that's trivial to fix in the compiler/standard library and not so hard to fix outside the standard library with a bit of custom use of exceptions (in actual fact, some benchmarking suggests that it might even be desireable to do so). Indeed, internal iteration is often faster than external iteration, and avoids a lot of intermediate structures so reduces object churn. I wouldn't be surprised if this was a slight performance gain. Equally, I wouldn't be surprised if it was a slight performance loss. I just don't think it's as clear cut as you're making it out to be. No way to tell other than to benchmark really. In terms of usability, that needs some work, but I don't find this to be particularly unusable as it stands. In terms of it being a core part of the library, the cost of implementation actually is a big deal. Scala currently lacks much in the way of a rich set of functional data structures. This isn't a huge issue - certainly I don't expect the EFPL team to have provided us with such - but the easier we can make it to implement new ones the more likely it is that we'll get such. > > My point is you can reimplement basically the entire Iterable trait > > sans "elements" based on only foreach. > > You can, and I think it would be good to have a superclass of Iterable > that does this. However, below that there need to be classes which > implement these methods in a way that best suits their access > characteristics. The primary example of this at present is > RandomAccessSeq, but the same is true of all the variations to some > degree. Well, sure. Subclasses are still welcome to override them to replace them with better versions - it would be madness to make these methods final. The idea is to make it so they do the right thing by default without forcing an override. Although, actually, I can't think of any reason you'd want to reimplement these for RandomAccessSeq. > > > Only if we assume that all our operations are non-strict, which I > > > don't think is a good thing. > > > > I think this is ok. The useage I have in mind is that you do a bunch > > of calculations with a "suspended" result and then immediately thunk > > the whole thing. > > So I'd have to write things like this? > > myNumbers.map(_ * 2).as[List, Int] > > That's... ugly. It can certainly be improved. For example, I think it could be reduced to myNumbers.map(_ * 2) : List[Int] now that the contractiveness condition has been removed, but that's a rather abusive use of implicit conversions. I need to have a play with it and see if I can come up with something in between. Even without that, I can definitely reduce it to as[List[Int]], which I find a bit better. I just ran into some last minute trouble with the way Buildable worked and the fact that Enumerable was covariant, so ended up with a slightly ugly signature there. And the thing is, yes. This is a bit uglier if you happen to have the right data structure. But even if map, flatMap, etc. all return the "right" thing (which they can't always do. SortedSet for example would require a different type signature for them if they were to return the right type of data structure) you often find you started with a List and now you want a Map. And with this you could equally do: myNumbers.map(_ * 2).as[FingerTree[Int]] or myNumbers.map(x => (x, x)).as[Map[Int, Int]] which I find a good deal more pleasant than the current state of play. I agree this makes the mainline case a little more awkward, and I need to have a think about how to improve that, but it makes a bunch of other cases much less so. > > > I'm not saying this will never be slower. But equally I expect there > > are a lot of times where it will be faster, particularly when it can > > eliminate a bunch of intermediate structures. > > Bear in mind that the library already offers Projections which do this, > so your version will not be faster. Projection works by creating custom iterators to do these things. There's a moderate amount more book keeping to be done to make this work with iterators, and they still implement their methods in terms of those iterators. Particularly for very nested structures like trees this is potentially a non-trivial amount of work, which might make my version faster. That being said, I would be amazed if my proposal ended up substantially faster than using projections all over the place, but hopefully it would be a good deal more convenient. :-) > It seems like your Enumerable and Buildable concepts are perfectly > valuable things to have as part of the library, but there's no way you I'd certainly be ok with, for example, offering Iterable as a subtrait of Enumerable and otherwise carrying on as normal. If nothing else it would smooth the transition if (and when ;-) ) it did turn out that I was right, and sometimes it really is convenient to have an iterator (for example, it's really hard to zip two enumerables together). > could use them as the basis for a practical collections hierarchy. The > same is likely true of any other simple solution. The problem is not > simple. Don't agree. Most problems are simple once you get down to the core. But I don't think there's any way for me to convince you of this other than to actually do it. :-) > I think the collections library needs tidying up and rationalising, > and updating to make use of current language features. I'd like to see > it /support/ collections that lack iterators. But I don't see the basic > design changing much. |
|
|
Re: Towards a revised collection APIWouldn't it be to our benefit to analyze a little the basic components
(traits, classes) of the Scala collection library and then see what we can do to make it better? I currently do not have the whole picture and I think there are a lot of people out there that would like to know the rationale behind, i.e. Projection, RandomAccessSeq, Iterator etc and the role they play in the design and implementation of the library. It is my opinion that only after we pin down these things, we may be able to contribute something better. On Sat, Apr 19, 2008 at 2:26 PM, David MacIver <david.maciver@...> wrote: > On Sat, Apr 19, 2008 at 3:09 AM, Jamie Webb <j@...> wrote: > > On 2008-04-18 13:28:48 David MacIver wrote: > > > On Fri, Apr 18, 2008 at 1:13 PM, Jamie Webb <j@...> > > > wrote: > > > > On 2008-04-18 12:28:38 David MacIver wrote: > > > > > So, with the recent noise I've been making about the collections > > > > > API, I figured I should put some thoughts down on screen. > > > > > > > > > > As I see it one of the main problems with the current collection > > > > > API is that in order to make things play nicely you need to > > > > > override methods at every step of the way. map, filter, flatMap > > > > > etc all return the "wrong" things. This is a pain to write and > > > > > easy to get wrong. > > > > > > > > This is what Martin is talking about fixing with type constructor > > > > polymorphism, which seems preferable by far. > > > > > > Sure. This solution uses type constructor polymorphism too. But > > > getting it to handle returning the right type automatically without > > > having to constantly reimplement isn't totally trivial. > > > > The collections hierarchy is such a core part of the library that I > > don't consider the implementation cost to be anywhere near as important > > as usability or performance. From what I can see, your proposal falls > > down on both those fronts. > > I don't think you've made a convincing argument that performance would > suffer. The only issue raised so far is that nonlocal return is slower > than it should be, but that's trivial to fix in the compiler/standard > library and not so hard to fix outside the standard library with a bit > of custom use of exceptions (in actual fact, some benchmarking > suggests that it might even be desireable to do so). Indeed, internal > iteration is often faster than external iteration, and avoids a lot of > intermediate structures so reduces object churn. I wouldn't be > surprised if this was a slight performance gain. Equally, I wouldn't > be surprised if it was a slight performance loss. I just don't think > it's as clear cut as you're making it out to be. No way to tell other > than to benchmark really. > > In terms of usability, that needs some work, but I don't find this to > be particularly unusable as it stands. > > In terms of it being a core part of the library, the cost of > implementation actually is a big deal. Scala currently lacks much in > the way of a rich set of functional data structures. This isn't a huge > issue - certainly I don't expect the EFPL team to have provided us > with such - but the easier we can make it to implement new ones the > more likely it is that we'll get such. > > > > > My point is you can reimplement basically the entire Iterable trait > > > sans "elements" based on only foreach. > > > > You can, and I think it would be good to have a superclass of Iterable > > that does this. However, below that there need to be classes which > > implement these methods in a way that best suits their access > > characteristics. The primary example of this at present is > > RandomAccessSeq, but the same is true of all the variations to some > > degree. > > Well, sure. Subclasses are still welcome to override them to replace > them with better versions - it would be madness to make these methods > final. The idea is to make it so they do the right thing by default > without forcing an override. > > Although, actually, I can't think of any reason you'd want to > reimplement these for RandomAccessSeq. > > > > > > Only if we assume that all our operations are non-strict, which I > > > > don't think is a good thing. > > > > > > I think this is ok. The useage I have in mind is that you do a bunch > > > of calculations with a "suspended" result and then immediately thunk > > > the whole thing. > > > > So I'd have to write things like this? > > > > myNumbers.map(_ * 2).as[List, Int] > > > > That's... ugly. > > It can certainly be improved. For example, I think it could be reduced > to myNumbers.map(_ * 2) : List[Int] now that the contractiveness > condition has been removed, but that's a rather abusive use of > implicit conversions. I need to have a play with it and see if I can > come up with something in between. > > Even without that, I can definitely reduce it to as[List[Int]], which > I find a bit better. I just ran into some last minute trouble with the > way Buildable worked and the fact that Enumerable was covariant, so > ended up with a slightly ugly signature there. > > And the thing is, yes. This is a bit uglier if you happen to have the > right data structure. But even if map, flatMap, etc. all return the > "right" thing (which they can't always do. SortedSet for example would > require a different type signature for them if they were to return the > right type of data structure) you often find you started with a List > and now you want a Map. And with this you could equally do: > > myNumbers.map(_ * 2).as[FingerTree[Int]] > > or > > myNumbers.map(x => (x, x)).as[Map[Int, Int]] > > which I find a good deal more pleasant than the current state of play. > > I agree this makes the mainline case a little more awkward, and I need > to have a think about how to improve that, but it makes a bunch of > other cases much less so. > > > > > > > I'm not saying this will never be slower. But equally I expect there > > > are a lot of times where it will be faster, particularly when it can > > > eliminate a bunch of intermediate structures. > > > > Bear in mind that the library already offers Projections which do this, > > so your version will not be faster. > > Projection works by creating custom iterators to do these things. > There's a moderate amount more book keeping to be done to make this > work with iterators, and they still implement their methods in terms > of those iterators. Particularly for very nested structures like trees > this is potentially a non-trivial amount of work, which might make my > version faster. That being said, I would be amazed if my proposal > ended up substantially faster than using projections all over the > place, but hopefully it would be a good deal more convenient. :-) > > > > It seems like your Enumerable and Buildable concepts are perfectly > > valuable things to have as part of the library, but there's no way you > > I'd certainly be ok with, for example, offering Iterable as a subtrait > of Enumerable and otherwise carrying on as normal. If nothing else it > would smooth the transition if (and when ;-) ) it did turn out that I > was right, and sometimes it really is convenient to have an iterator > (for example, it's really hard to zip two enumerables together). > > > > could use them as the basis for a practical collections hierarchy. The > > same is likely true of any other simple solution. The problem is not > > simple. > > Don't agree. Most problems are simple once you get down to the core. > But I don't think there's any way for me to convince you of this other > than to actually do it. :-) > > > > > I think the collections library needs tidying up and rationalising, > > and updating to make use of current language features. I'd like to see > > it /support/ collections that lack iterators. But I don't see the basic > > design changing much. > > -- __~O -\ <, Christos KK Loverdos (*)/ (*) http://ckkloverdos.com |
|
|
Re: Towards a revised collection APIOn 2008-04-19 12:26:10 David MacIver wrote:
> I don't think you've made a convincing argument that performance would > suffer. My point is that performance would not improve, and that reimplementations in subclasses would be at least as necessary as they are now in order to maintain the current level of performance. > Although, actually, I can't think of any reason you'd want to > reimplement these for RandomAccessSeq. If you implement e.g. map over a RandomAccessSeq using foreach, you lose random access. > It can certainly be improved. For example, I think it could be reduced > to myNumbers.map(_ * 2) : List[Int] now that the contractiveness > condition has been removed, but that's a rather abusive use of > implicit conversions. I need to have a play with it and see if I can > come up with something in between. That's still horrible though. I suspect that whatever Martin and Adriaan come up with will be rather more palatable. > I agree this makes the mainline case a little more awkward, and I need > to have a think about how to improve that, but it makes a bunch of > other cases much less so. Conversion between collections is (potentially) a pretty orthogonal issue. It would be trivial to allow code such as: myNumbers.projection.map(_ * 2).as(FingerTree) And I don't even need any implicits to do it. Incidentally, don't you think that your Builders look an awful lot like the opposite of an Iterator? That is, following the direction you've taken with Enumerable I imagine a somehow more 'pure' way to build a new data structure would look like: object List extends Maker[List] { def make[A](e : Enumerable[A]) = { val b = new Listbuffer[A] e.foreach(b += _) b.toList } } > I'd certainly be ok with, for example, offering Iterable as a subtrait > of Enumerable and otherwise carrying on as normal. If nothing else it > would smooth the transition if (and when ;-) ) it did turn out that I > was right, and sometimes it really is convenient to have an iterator > (for example, it's really hard to zip two enumerables together). But we ought reasonably to be able to zip any ordered collections, right? So they all need iterators to do so efficiently. > > could use them as the basis for a practical collections hierarchy. > > The same is likely true of any other simple solution. The problem > > is not simple. > > Don't agree. Most problems are simple once you get down to the core. If by 'get down to the core' you mean 'discard half the requirements', then yes I agree... To summarise, what I like: - A reduction in the importance of Iterators - A mechanism for more general conversions between collections What I don't like: - Methods that return Enumerable rather than the expected type - An assumption that any single access method will be applicable to all data structures and operations. /J |
|
|
Re: Towards a revised collection APIOn Sat, Apr 19, 2008 at 6:49 PM, Jamie Webb <j@...> wrote:
> On 2008-04-19 12:26:10 David MacIver wrote: > > I don't think you've made a convincing argument that performance would > > suffer. > > My point is that performance would not improve, and that Don't agree. I'm not making this up - internal iteration really is often faster, and the suggestion is not without precedent. It's at least partially based on http://okmij.org/ftp/papers/LL3-collections-enumerators.txt (note: That's oleg's site. It's generally a good idea to assume he knows what he's talking about). Parts of what are suggested become a bit more difficult due to the lack of continuations, but the basic idea carries over. > reimplementations in subclasses would be at least as necessary as they > are now in order to maintain the current level of performance. Also don't agree. There very often isn't a lot of performance benefit to be gained > > Although, actually, I can't think of any reason you'd want to > > reimplement these for RandomAccessSeq. > > If you implement e.g. map over a RandomAccessSeq using foreach, you > lose random access. Well, yes. That's sortof the point of this design. If you want a RandomAccessSeq you get one back out at the end. In particular the combination foo.map(bar).as[Array, Stuff] will end up compiling to something basically equivalent to the override you'd want to make. > > It can certainly be improved. For example, I think it could be reduced > > to myNumbers.map(_ * 2) : List[Int] now that the contractiveness > > condition has been removed, but that's a rather abusive use of > > implicit conversions. I need to have a play with it and see if I can > > come up with something in between. > > That's still horrible though. I suspect that whatever Martin and > Adriaan come up with will be rather more palatable. There's something of a limit to how much they can do and still retain enough flexibility. I certainly look forward to seeing what they come up with. > > I agree this makes the mainline case a little more awkward, and I need > > to have a think about how to improve that, but it makes a bunch of > > other cases much less so. > > Conversion between collections is (potentially) a pretty orthogonal > issue. It would be trivial to allow code such as: Well, it's true that you can add this functionality without the rest of my proposal and that it's probably worth doing so. Clearly we differ as to whether it's still worth trying to make sure everything is of the right type once one has done that. :-) > myNumbers.projection.map(_ * 2).as(FingerTree) > > And I don't even need any implicits to do it. True, the implicits can be avoided, but I prefer to use them for various reasons (mainly not having to care about the name of what provides the service and it avoids a hardcoded reference - if I want to change the behaviour for the local scope I can). > Incidentally, don't you think that your Builders look an awful lot like > the opposite of an Iterator? That is, following the direction you've > taken with Enumerable I imagine a somehow more 'pure' way to build a new > data structure would look like: > > object List extends Maker[List] { > def make[A](e : Enumerable[A]) = { > val b = new Listbuffer[A] > e.foreach(b += _) > b.toList > > } > } That's a very good idea actually. In particular it solves a number of things I've been worrying about (the main one being that if List overrides something so that it actually returns a List then as[List, T] would still copy it). Thanks. > > I'd certainly be ok with, for example, offering Iterable as a subtrait > > of Enumerable and otherwise carrying on as normal. If nothing else it > > would smooth the transition if (and when ;-) ) it did turn out that I > > was right, and sometimes it really is convenient to have an iterator > > (for example, it's really hard to zip two enumerables together). > > But we ought reasonably to be able to zip any ordered collections, > right? So they all need iterators to do so efficiently. This can effectively zip any two sequences of know size fairly efficiently. I don't feel that a lack of general purpose zipping is a great loss, although it does annoy me slightly. This is one of the things were the lack of continuations is a bit painful. > > > could use them as the basis for a practical collections hierarchy. > > > The same is likely true of any other simple solution. The problem > > > is not simple. > > > > Don't agree. Most problems are simple once you get down to the core. > > If by 'get down to the core' you mean 'discard half the requirements', > then yes I agree... Actually, that's exactly what I mean. Except that I would say "features" rather than requirements. If you minimize the set of what you want you can come up with much more effective abstractions. > To summarise, what I like: > > - A reduction in the importance of Iterators > - A mechanism for more general conversions between collections > > What I don't like: > > - Methods that return Enumerable rather than the expected type > - An assumption that any single access method will be applicable to > all data structures and operations. I don't assume that any single access method will be applicable to all data structures and operations. Like I said, you're clearly going to want to override these sometimes. I just think that this gets you most of the way there in a way that's a lot cleaner than requiring everything to return the right type and/or implement an iterator and for the cases where you do need more detailed operations you can choose how to get there explicitly. Anyway, I'm tinkering with the API here if you want to take a look (right now it's mostly just a slightly fleshed out version of what you've seen plus some stuff for sequences): http://freehg.org/u/DRMacIver/alt.collections/ It's mostly for playing with the API, but at some point I'll probably be adding more interesting collection types. Incidentally, there's a perfectly valid way of defining a conversion from a Traversable (what I renamed traversable to) to a Stream with almost the right laziness properties (it's a bit stricter in the head than it should be). Here's an (untested) implementation: object End extends Throwable; def toStream : Stream[T] = { val queue = new java.util.concurrent.SynchronousQueue[() => T]; spawn { try { foreach(x => { queue.put(() => x)); } } finally { queue.put(() => End); } } def next = try { val h = queue.get()(); new Stream[T] { val head = h; lazy val tail = next; } } catch { case End => Stream.empty } next; } Why yes, this is in fact 110% pure evil. I'm glad you noticed. ;-) |
|
|
Re: Towards a revised collection APIOn 2008-04-19 20:30:01 David MacIver wrote:
> On Sat, Apr 19, 2008 at 6:49 PM, Jamie Webb <j@...> > wrote: > > On 2008-04-19 12:26:10 David MacIver wrote: > > > I don't think you've made a convincing argument that performance > > > would suffer. > > > > My point is that performance would not improve, and that > > Don't agree. I'm not making this up - internal iteration really is > often faster, and the suggestion is not without precedent. It's at If you can show that your scheme of delaying evaluation and returning Traversables is capable of making a simple List.range(1, 100000).map(_ * 2) (returning a List) as fast as it is currently, I'll stop arguing about performance. > > If you implement e.g. map over a RandomAccessSeq using foreach, you > > lose random access. > > Well, yes. That's sortof the point of this design. If you want a > RandomAccessSeq you get one back out at the end. > > In particular the combination foo.map(bar).as[Array, Stuff] will end > up compiling to something basically equivalent to the override you'd > want to make. Sorry, I should have said RandomAccessSeq.Projection. If I write: (1 to 1000000).map(_ * 2)(50000) I'll get that one mapping and it will not calculate the other 999999. This is not possible under your scheme, except by reimplementing the relevant methods not to use foreach. > True, the implicits can be avoided, but I prefer to use them for > various reasons (mainly not having to care about the name of what > provides the service and it avoids a hardcoded reference - if I want > to change the behaviour for the local scope I can). But you can do that without implicits too, just using variables. They compile to the same thing, after all. The nice thing about avoiding implicits is that type inference works better. > This can effectively zip any two sequences of know size fairly > efficiently. Really? I don't see how, except by eagerly constructing an array. > > If by 'get down to the core' you mean 'discard half the > > requirements', then yes I agree... > > Actually, that's exactly what I mean. Except that I would say > "features" rather than requirements. If you minimize the set of what > you want you can come up with much more effective abstractions. Clearly I am more demanding than you. :-) > I don't assume that any single access method will be applicable to all > data structures and operations. Like I said, you're clearly going to > want to override these sometimes. I just think that this gets you most > of the way there in a way that's a lot cleaner than requiring > everything to return the right type and/or implement an iterator and > for the cases where you do need more detailed operations you can > choose how to get there explicitly. It seems to me that your theory that it's an advantage to have every operation return the same type is predicated on an assumption that it makes sense to treat data structures uniformly. > Anyway, I'm tinkering with the API here if you want to take a look > (right now it's mostly just a slightly fleshed out version of what > you've seen plus some stuff for sequences): > http://freehg.org/u/DRMacIver/alt.collections/ > > It's mostly for playing with the API, but at some point I'll probably > be adding more interesting collection types. I'll be interested to see those. /J |
|
|
Re: Towards a revised collection API> If you can show that your scheme of delaying evaluation and returning
> Traversables is capable of making a simple > > List.range(1, 100000).map(_ * 2) > > (returning a List) as fast as it is currently, I'll stop arguing about > performance. > I am very much in favor of delayed evaluation, can't say about the rest stuff yet... -- __~O -\ <, Christos KK Loverdos (*)/ (*) http://ckkloverdos.com |
|
|
Re: Towards a revised collection APIOn Mon, Apr 21, 2008 at 5:16 AM, Jamie Webb <j@...> wrote:
> On 2008-04-19 20:30:01 David MacIver wrote: > > On Sat, Apr 19, 2008 at 6:49 PM, Jamie Webb <j@...> > > wrote: > > > On 2008-04-19 12:26:10 David MacIver wrote: > > > > I don't think you've made a convincing argument that performance > > > > would suffer. > > > > > > My point is that performance would not improve, and that > > > > Don't agree. I'm not making this up - internal iteration really is > > often faster, and the suggestion is not without precedent. It's at > > If you can show that your scheme of delaying evaluation and returning > Traversables is capable of making a simple > > List.range(1, 100000).map(_ * 2) > > (returning a List) as fast as it is currently, I'll stop arguing about > performance. Some simple (and totally unscientific! I wouldn't believe their results except qualitatively) tests show mine is about 15% slower (Depending on the exact numbers I feed in and the value of a hardware random number generator sitting somewhere in Sun HQ, mine is sometimes faster. This happens sufficiently rarely that I'm assumine it's a quirk of when the garbage collector kicks in or an artifact of measurement or something). Curiously, my approach seems to consistently be very slightly faster for arrays. I think that might be because we get better control of when the magic snap to the right type happens and/or avoid it entirely. Not sure. The 15% slowdown is presumably because hotspot isn't quite the lean mean inlining machine we like to believe it is - they should inline to roughly the same thing, but List does it manually whileas mine has a few more method invocations to go through. A similar test on filter shows that way is a bit slower but not as much for lists and significantly faster for arrays (I'm slightly alarmed at *how* much faster - it looks to be about 30%. The array implementation must be doing something really odd. That should probably be considered a bug in the standard array implementation) In actual fact, the idea that caused me to come up with doing it this way was thinking about efficient implementation of a purely functional array language, so I'm not entirely surprised that it works well with arrays. I can't think of a good reason though. :-) You can take a look at the tests here: http://freehg.org/u/DRMacIver/alt.collections/file/53f65c4f9249/perftests.scala > > > If you implement e.g. map over a RandomAccessSeq using foreach, you > > > lose random access. > > > > Well, yes. That's sortof the point of this design. If you want a > > RandomAccessSeq you get one back out at the end. > > > > In particular the combination foo.map(bar).as[Array, Stuff] will end > > up compiling to something basically equivalent to the override you'd > > want to make. > > Sorry, I should have said RandomAccessSeq.Projection. If I write: > > (1 to 1000000).map(_ * 2)(50000) > > I'll get that one mapping and it will not calculate the other 999999. > This is not possible under your scheme, except by reimplementing the > relevant methods not to use foreach. That's a good point. I'll have a think about it. It's a slightly different one though in that this can just be done as a trait for behaviour. It's almost certainly worth doing, but it doesn't seem to be problematic to fit in with the rest of this. > > True, the implicits can be avoided, but I prefer to use them for > > various reasons (mainly not having to care about the name of what > > provides the service and it avoids a hardcoded reference - if I want > > to change the behaviour for the local scope I can). > > But you can do that without implicits too, just using variables. It's not like using implicits means you can't call it directly with the target the implicit would have used. > They compile to the same thing, after all. The nice thing about > avoiding implicits is that type inference works better. Yes, but the data inference is significantly worse. :-) > > This can effectively zip any two sequences of know size fairly > > efficiently. > > Really? I don't see how, except by eagerly constructing an array. Well, that's reasonably efficient. :-) > > Actually, that's exactly what I mean. Except that I would say > > "features" rather than requirements. If you minimize the set of what > > you want you can come up with much more effective abstractions. > > Clearly I am more demanding than you. :-) And consequently you get fewer cool toys to play with. :-) "No! I don't want that toy truck. I want a RED one!" More seriously, the more you pare down the requirements the wider the range of things that implement them is. Hence, cool toys. > > I don't assume that any single access method will be applicable to all > > data structures and operations. Like I said, you're clearly going to > > want to override these sometimes. I just think that this gets you most > > of the way there in a way that's a lot cleaner than requiring > > everything to return the right type and/or implement an iterator and > > for the cases where you do need more detailed operations you can > > choose how to get there explicitly. > > It seems to me that your theory that it's an advantage to have every > operation return the same type is predicated on an assumption that it > makes sense to treat data structures uniformly. No, it's predicated on the assumption that it makes sense to treat them uniformly right up to the point where it makes sense to to otherwise. It's also predicated on the observation that there are quite a few cases where you *have* to do this. So we might as well make it as painless as possible to do. :-) > > Anyway, I'm tinkering with the API here if you want to take a look > > (right now it's mostly just a slightly fleshed out version of what > > you've seen plus some stuff for sequences): > > http://freehg.org/u/DRMacIver/alt.collections/ > > > > It's mostly for playing with the API, but at some point I'll probably > > be adding more interesting collection types. > > I'll be interested to see those. I'll give you a shout when any implementations worth looking at are up there. :-) |
|
|
Re: Towards a revised collection APIOn Mon, Apr 21, 2008 at 11:26 AM, David MacIver <david.maciver@...> wrote:
> A similar test on filter shows that way is a bit slower but not as > much for lists and significantly faster for arrays (I'm slightly > alarmed at *how* much faster - it looks to be about 30%. The array > implementation must be doing something really odd. That should > probably be considered a bug in the standard array implementation) I've filed a bug about this. Looks like it's just a slow implementation. I expect once that's fixed the numbers will be closer. http://lampsvn.epfl.ch/trac/scala/ticket/783 |
|
|
Re: Towards a revised collection APIOn Mon, Apr 21, 2008 at 11:26 AM, David MacIver <david.maciver@...> wrote:
> You can take a look at the tests here: > http://freehg.org/u/DRMacIver/alt.collections/file/53f65c4f9249/perftests.scala The astute observer will note that I shouldn't be allowed to write performance tests while half awake. These are backwards... Sigh. |
|
|
Re: Towards a revised collection APIOn 2008-04-21 14:26:32 David MacIver wrote:
> On Mon, Apr 21, 2008 at 11:26 AM, David MacIver > <david.maciver@...> wrote: > > You can take a look at the tests here: > > http://freehg.org/u/DRMacIver/alt.collections/file/53f65c4f9249/perftests.scala > > The astute observer will note that I shouldn't be allowed to write > performance tests while half awake. These are backwards... Not only that, you're probably measuring interpreted performance, and the work will probably be dwarfed by GC time... Here are the numbers I get (-s is Scala, -d is you): listmap-s listmap-d arraymap-s arraymap-d listfilter-s listfilter-d arrayfilter-s arrayfilter-d ------------------------------------------------------------------------------------------------------------- 901 1133 1124 1778 586 718 705 1592 904 1132 1117 1771 605 737 730 1555 912 1126 1110 1768 602 744 715 1569 923 1110 1087 1785 601 744 731 1560 897 1135 1106 1837 609 734 737 1592 897 1129 1108 1774 605 725 727 1548 892 1160 1148 1784 612 733 707 1565 942 1186 1100 1745 589 764 720 1606 909 1160 1141 1734 589 733 720 1553 902 1156 1131 1782 588 733 722 1574 ------------------------------------------------------------------------------------------------------------- Avg 908 ms 1143 ms 1117 ms 1776 ms 599 ms 736 ms 721 ms 1571 ms ------------------------------------------------------------------------------------------------------------- This is for a list/array of 1000 elements, with timings given for 45000 repetitions. Sorry, you can't repeat it until I publish my benchmarking library in Scalax (need to update dependencies; I'll try to do it tomorrow). /J |
|
|
Re: Towards a revised collection APIOn Mon, Apr 21, 2008 at 3:27 PM, Jamie Webb <j@...> wrote:
> > On 2008-04-21 14:26:32 David MacIver wrote: > > On Mon, Apr 21, 2008 at 11:26 AM, David MacIver > > <david.maciver@...> wrote: > > > You can take a look at the tests here: > > > http://freehg.org/u/DRMacIver/alt.collections/file/53f65c4f9249/perftests.scala > > > > The astute observer will note that I shouldn't be allowed to write > > performance tests while half awake. These are backwards... > > Not only that, you're probably measuring interpreted performance, and Hm. that's possible. Because each test gets compiled into a separate method and run multiple times, I assumed hotspot would JIT compile the method after the first few invocations, but it should really be running a bunch of cold starts first I suppose. I freely admit to not being very good at benchmarking Java programs though, so I could have completely the wrong end of the stick. :-) > the work will probably be dwarfed by GC time... Well, I force a garbage collection between tests, so any GC time that is measured should actually be significant numbers. > Here are the numbers I get (-s is Scala, -d is you): > > listmap-s listmap-d arraymap-s arraymap-d listfilter-s listfilter-d arrayfilter-s arrayfilter-d > ------------------------------------------------------------------------------------------------------------- > 901 1133 1124 1778 586 718 705 1592 > 904 1132 1117 1771 605 737 730 1555 > 912 1126 1110 1768 602 744 715 1569 > 923 1110 1087 1785 601 744 731 1560 > 897 1135 1106 1837 609 734 737 1592 > 897 1129 1108 1774 605 725 727 1548 > 892 1160 1148 1784 612 733 707 1565 > 942 1186 1100 1745 589 764 720 1606 > 909 1160 1141 1734 589 733 720 1553 > 902 1156 1131 1782 588 733 722 1574 > ------------------------------------------------------------------------------------------------------------- > Avg 908 ms 1143 ms 1117 ms 1776 ms 599 ms 736 ms 721 ms 1571 ms > ------------------------------------------------------------------------------------------------------------- > > This is for a list/array of 1000 elements, with timings given for 45000 > repetitions. That does look rather bleaker than I'd hoped. :-) I don't find the list numbers too bad, particularly as I expect they'll improve slightly if List implements traversable directly rather than using a wrapper. If the performance works well in other cases (which has yet to be tested) I'm ok with that level of slowdown for something like this - given that the design is intended to eliminate intermediate structures it's not surprising it doesn't give much advantage when there are none. The array numbers are more of an issue, but I think I know how to improve those. How do the numbers change if you increase the size of the list/array? > Sorry, you can't repeat it until I publish my benchmarking library in > Scalax (need to update dependencies; I'll try to do it tomorrow). I look forward to it. |
| < Prev | 1 - 2 | Next > |
| Free embeddable forum powered by Nabble | Forum Help |