[scala] Streams

View: New views
13 Messages — Rating Filter:   Alert me  

[scala] Streams

by puzzler :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I was under the impression that if you access, say, the 10000-th
element of a Stream, then the subsequent access to the 10000-th
element should be considerably faster.  This is because the first
time, the stream is actually building the list of elements on demand.
The second time, it is merely a lookup in the already-built list.

But, when I test this out, it appears to be taking equally _long_ for
both accesses.

Is there a "timing" function in Scala that lets me easily test the
time spent executing a function?

Speaking of Streams, I noticed that:
a) The Stream.from method is not documented in the class
documentation.  Why is that?  Isn't the class documentation
automatically generated from the source?
b) The Stream.from method only takes ints, not BigInts, floats, etc.
This seems kind of silly, because it's perfectly reasonable to want to
generate a stream of all even BigInts, or all the floats counting by
.5.  The from method should support any type that has a + method.  Is
there already such a "type class" built into Scala, or does one need
to write his own, and build implicit conversion functions for all the
numeric types, in order to get something simple like this to work?

Thanks,

Mark

[scala] Re: Streams

by puzzler :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> a) The Stream.from method is not documented in the class
> documentation.  Why is that?  Isn't the class documentation
> automatically generated from the source?

I see now that it is in the Stream object documentation, rather than
the class documentation.  I keep missing that classes are effectively
documented in two places, in the "class" and in the "object".

--Mark

Re: [scala] Re: Streams

by Jon Pretty :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Mark,

Mark Engelberg wrote:
> I see now that it is in the Stream object documentation, rather than
> the class documentation.  I keep missing that classes are effectively
> documented in two places, in the "class" and in the "object".

I think this only seems unnatural in light of Java's approach.  If you consider that dereferencing
is always done in the form <object name>.<member name> and never <class name>.<member name>, Scala
is consistent.  The only slight confusion is that we have an object and a class with different names
(albeit in different namespaces).

Cheers,
Jon

--
Jon Pretty | Sygneca Ltd.

Re: [scala] Re: Streams

by Henrik Huttunen :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi.
> I see now that it is in the Stream object documentation, rather than
> the class documentation.  I keep missing that classes are effectively
> documented in two places, in the "class" and in the "object".
Here's a search facility that may be of help later:
http://scripts.mit.edu/~y_z/sds/
(I haven't made it)

- Henrik

Re: [scala] Re: Streams

by Jon Pretty :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Jon Pretty wrote:
> I think this only seems unnatural in light of Java's approach.  If you
> consider that dereferencing is always done in the form <object
> name>.<member name> and never <class name>.<member name>, Scala is
> consistent.  The only slight confusion is that we have an object and a
> class with different names (albeit in different namespaces).

I clearly didn't get enough sleep last night.  Here's what I meant to say:

The only slight confusion is that we have an object and a class with the *same name* (albeit in
different namespaces).

Jon

--
Jon Pretty | Sygneca Ltd.

Re: [scala] Streams

by tmorris :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Mark Engelberg wrote:
> I was under the impression that if you access, say, the 10000-th
> element of a Stream, then the subsequent access to the 10000-th
> element should be considerably faster.  This is because the first
> time, the stream is actually building the list of elements on demand.
> The second time, it is merely a lookup in the already-built list.

I wouldn't have expected this. This is often called "memois(z)ation" and
is an irreversible space/computation trade in favour of computational
advantage at the expense of space. It becomes irreversible at this
point, therefore, the current behaviour is the correct one - unless you
wish to suggest that in some areas Scala is eager by default and so it
is inconsistent here.

In any case, if the behaviour that you describe was built-in to the
Stream class, then this would mean yet *another* type that I would have
to rewrite and I'm getting tired of this :) You should build a memo
table outside of Stream yourself - perhaps create a type called
MemoStream on top of Stream that does what you want. Note that it is
impossible to write a "UnmemoStream" (lazy) on top of "MemoStream"
(strict), which is why your described behaviour would require a rewrite.

Here is a good start (an excerpt of my code for doing what you describe):

import scala.collection.mutable.HashMap
import scala.collection.immutable.TreeMap

// permits destructive updates without violating referential
transparency (except within the local context)
trait Memo[k, v] {
  def get(f: k => v)(k: => k): v
}

object Memo {
  def arrayMemo[k, v](n: Int) = new Memo[Int, v] {
    val a: Array[v] = new Array(n)

    def get(f: Int => v)(i: => Int) = if(i < 0 || i >= a.length)
error("out of range: " + i)
      else if(a(i) == null) {
        val x = f(i)
        a(i) = x
        x
      } else a(i)
  }

  def nilMemo[k, v] = new Memo[k, v] {
    def get(f: k => v)(k: => k) = f(k)
  }

  def hashMemo[k, v] = new Memo[k, v] {
    val m = new HashMap[k, v]

    def get(f: k => v)(i: => k) = {
      val o = m.get(i)

      o match {
        case Some(v) => v
        case None => {
          val x = f(i)
          m.update(i, x)
          x
        }
      }
    }
  }

  def treeMemo[k, v](implicit o: k => Ordered[k]) = new Memo[k, v] {
    var m = new TreeMap[k, v]

    def get(f: k => v)(i: => k) = {
      val o = m.get(i)

      o match {
        case Some(v) => v
        case None => {
          val x = f(i)
          m = m.update(i, x)
          x
        }
      }
    }
  }
}


>
> But, when I test this out, it appears to be taking equally _long_ for
> both accesses.
>
> Is there a "timing" function in Scala that lets me easily test the
> time spent executing a function?
>
> Speaking of Streams, I noticed that:
> a) The Stream.from method is not documented in the class
> documentation.  Why is that?  Isn't the class documentation
> automatically generated from the source?
> b) The Stream.from method only takes ints, not BigInts, floats, etc.
> This seems kind of silly, because it's perfectly reasonable to want to
> generate a stream of all even BigInts, or all the floats counting by
> .5.  The from method should support any type that has a + method.  Is
> there already such a "type class" built into Scala, or does one need
> to write his own, and build implicit conversion functions for all the
> numeric types, in order to get something simple like this to work?
>
> Thanks,
>
> Mark
>
>
>

Tony Morris
http://tmorris.net/





signature.asc (198 bytes) Download Attachment

Re: [scala] Streams

by puzzler :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On 6/21/07, Tony Morris <tmorris@...> wrote:

> Mark Engelberg wrote:
> > I was under the impression that if you access, say, the 10000-th
> > element of a Stream, then the subsequent access to the 10000-th
> > element should be considerably faster.  This is because the first
> > time, the stream is actually building the list of elements on demand.
> > The second time, it is merely a lookup in the already-built list.
>
> I wouldn't have expected this. This is often called "memois(z)ation" and
> is an irreversible space/computation trade in favour of computational
> advantage at the expense of space. It becomes irreversible at this
> point, therefore, the current behaviour is the correct one - unless you
> wish to suggest that in some areas Scala is eager by default and so it
> is inconsistent here.

In Haskell, lazy lists work as I describe.  In F#, lazy lists work as
I describe.  (F# also has an uncached lazy list called Seq, sort of
like a functional "iterator").

In general, languages need both constructs.  I was under the
impression that in Scala, the Iterator class fills the need for an
uncached construct, and the Stream class fills the need for a cached
lazy list that is the norm in functional languages.

If Stream doesn't cache its values, why bother having both Stream and Iterator?

--Mark

Re: [scala] Streams

by tmorris :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Mark Engelberg wrote:
> On 6/21/07, Tony Morris <tmorris@...> wrote:
>> Mark Engelberg wrote:
>> > I was under the impression that if you access, say, the 10000-th
>> > element of a Stream, then the subsequent access to the 10000-th
>> > element should be considerably faster.
[snip]
> In Haskell, lazy lists work as I describe.  

Absolutely not (by the way, some Haskell purists would consider this
accusation blasphemy)

xs !! n will always run in O(n), always

> In F#, lazy lists work as
> I describe.

I haven't used F# enough to know, but I'd hope note.

> (F# also has an uncached lazy list called Seq, sort of
> like a functional "iterator").
>
> In general, languages need both constructs.  

They need a "completely" lazy list at the very least. All other
behaviour can be written to exist on top as generalisation is lost
(including the one that you initially described and the behaviour of the
F# list that you describe).

> I was under the
> impression that in Scala, the Iterator class fills the need for an
> uncached construct, and the Stream class fills the need for a cached
> lazy list that is the norm in functional languages.

It is already the norm, it never memoises.

>
> If Stream doesn't cache its values, why bother having both Stream and
> Iterator?

One is pure, the other is imperative (Iterator). You might notice that
Iterator.next is side-effecting. Imagine changing its next method to
return a tuple containing an Iterator instead. i.e. instead of returning
a T (for Iterator[T]), it returns a (T, Iterator[T]).

What do we have now? A stream! next._1 is head and next._2 is tail.

A better question might be "why bother having Iterators (or imperative
constructs)?". I suspect the answer has something to do with Java and
targetting the JVM i.e. a flawed premise.

>
> --Mark
>
>
>


Tony Morris
http://tmorris.net/





signature.asc (198 bytes) Download Attachment

[scala] Re: Streams

by Eric Willigers :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Tony Morris wrote:

> Mark Engelberg wrote:
>> I was under the impression that if you access, say, the 10000-th
>> element of a Stream, then the subsequent access to the 10000-th
>> element should be considerably faster.  This is because the first
>> time, the stream is actually building the list of elements on demand.
>> The second time, it is merely a lookup in the already-built list.
>
> I wouldn't have expected this. This is often called "memois(z)ation" and
> is an irreversible space/computation trade in favour of computational
> advantage at the expense of space.


The stream is constructed the first time. A linked list is traversed the
second time (unless you memoize - then it's just a look-up).  The second
time will already be considerably faster in cases where it takes a long
time to determine each stream node.



Here's a simple test with a timing method.


     def reportTime[T](description: String, computation: => T): T =
     {
         val startTime = System.nanoTime // or use System.currentTimeMillis

         val result: T = computation

         Console.println(description+" took "+(System.nanoTime -
startTime)+" ns")

         result
     }

     // like Stream.from, but using BigInt and having a sleep before
each cons
     def slowFrom(start: BigInt, step: BigInt): Stream[BigInt] =
     {
         Thread.sleep(1)
         Stream.cons(start, slowFrom(start+step, step))
     }

     val numberStream = slowFrom(0, 1)

     Console.println(reportTime("first  access", numberStream(10000)))
     Console.println(reportTime("second access", numberStream(10000)))


first  access took 19610635366 ns
10000
second access took 650975 ns
10000


Without the Thread.sleep(1):

first  access took 25804097 ns
10000
second access took 707413 ns
10000


(Note that the precision is misleading; ideally we'd be averaging over
many runs and reporting in seconds or milliseconds.)



>> b) The Stream.from method only takes ints, not BigInts, floats, etc.
>> This seems kind of silly, because it's perfectly reasonable to want to
>> generate a stream of all even BigInts, or all the floats counting by
>> .5.  The from method should support any type that has a + method.  Is
>> there already such a "type class" built into Scala, or does one need
>> to write his own, and build implicit conversion functions for all the
>> numeric types, in order to get something simple like this to work?

I'm not aware of one. Perhaps the design could be explored in a
different thread.


Re: [scala] Streams

by puzzler :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On 6/21/07, Tony Morris <tmorris@...> wrote:

> Mark Engelberg wrote:
> > On 6/21/07, Tony Morris <tmorris@...> wrote:
> >> Mark Engelberg wrote:
> >> > I was under the impression that if you access, say, the 10000-th
> >> > element of a Stream, then the subsequent access to the 10000-th
> >> > element should be considerably faster.
> [snip]
> > In Haskell, lazy lists work as I describe.
>
> Absolutely not (by the way, some Haskell purists would consider this
> accusation blasphemy)
>
> xs !! n will always run in O(n), always
>

Yes, it is always O(n), because it still needs to traverse the list.
However, it does not recompute the elements of the list.  So if the
definition of how the list is constructed is computationally
intensive, you notice a dramatic speedup in Haskell when you do xs!!n
the second time.

--Mark

Re: [scala] Re: Streams

by puzzler :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Thanks, Eric.  It's nice to know that Stream should work the way I
expect.  I'll use your timing code on some of my own tests, and try to
figure out why I didn't seem to be getting a speedup on subsequent
accesses.

--Mark

Re: [scala] Streams

by tmorris :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Mark Engelberg wrote:

> On 6/21/07, Tony Morris <tmorris@...> wrote:
>> Mark Engelberg wrote:
>> > On 6/21/07, Tony Morris <tmorris@...> wrote:
>> >> Mark Engelberg wrote:
>> >> > I was under the impression that if you access, say, the 10000-th
>> >> > element of a Stream, then the subsequent access to the 10000-th
>> >> > element should be considerably faster.
>> [snip]
>> > In Haskell, lazy lists work as I describe.
>>
>> Absolutely not (by the way, some Haskell purists would consider this
>> accusation blasphemy)
>>
>> xs !! n will always run in O(n), always
>>
>
> Yes, it is always O(n), because it still needs to traverse the list.
> However, it does not recompute the elements of the list.  So if the
> definition of how the list is constructed is computationally
> intensive, you notice a dramatic speedup in Haskell when you do xs!!n
> the second time.
>
> --Mark
>
>
>
This is a different matter. But back to your original point, you are
describing exactly how Streams work. Notice that the first argument to
cons.apply[1] is declared without a laziness annotation (=>).

[1]
http://scalasvn.epfl.ch/cgi-bin/viewvc.cgi/scala/trunk/src/library/scala/Stream.scala?view=markup

This means that the following code behaves the way it does. Hope this helps.

object F {
  def main(args: Array[String]) = {
    import Stream._

    val x = cons(1, cons(2, cons(3, empty)))
    def y = x.map(x => { Console.println("effect: " + x); x + 2 })
    val z = y

    for(val s <- y)
      Console.println(s)

    Console.println("------------------")

    for(val s <- y)
      Console.println(s)

    Console.println("------------------")

    for(val s <- z)
      Console.println(s)

    Console.println("------------------")

    for(val s <- z)
      Console.println(s)
  }
}

$ scala F
effect: 1
effect: 1
3
effect: 2
4
effect: 3
5
------------------
effect: 1
3
effect: 2
4
effect: 3
5
------------------
3
effect: 2
4
effect: 3
5
------------------
3
4
5


Tony Morris
http://tmorris.net/






signature.asc (198 bytes) Download Attachment

Re: [scala] Re: Streams

by puzzler :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I found my problem.

I was defining my stream for testing as:

def myStream = slowFrom(0,1)

I didn't realize this was making myStream a function that creates a
new stream each time it is invoked.  I needed to write:

val myStream = slowFrom(0,1)

Now I see the speedup I expected from subsequent accesses.

Thanks everyone!

On 6/21/07, Mark Engelberg <mark.engelberg@...> wrote:
> Thanks, Eric.  It's nice to know that Stream should work the way I
> expect.  I'll use your timing code on some of my own tests, and try to
> figure out why I didn't seem to be getting a speedup on subsequent
> accesses.
>
> --Mark
>