« Return to Thread: [scala] New collections (slice/drop) problem

Re: [scala] New collections (slice/drop) problem

by Martin Odersky :: Rate this Message:

Reply to Author | View in Thread

Hi Ivan,

What you describe is a bug. View slices are supposed to compactify
their start and end indices, so that you should not see a stack
overflow. I'll look into it.

Cheers

 -- Martin

On Tue, Jun 23, 2009 at 8:29 PM, Ivan Tarasov<ivan.tarasov@...> wrote:

> I'm using one of the recent development builds of 2.8.0 (so I'm using the
> new collections API). Do you suggest upgrading to the latest?
>
> I don't like the idea of "not dropping and passing the index" — that's quite
> ugly and defeats the purpose of the APIs.
>
> To me it looks like a bug in the Scala collections classes, however I wanted
> to make sure that I'm not doing anything stupid (or missing some obvious
> not-ugly solution) before I submit a bug report.
>
> Ivan
>
> On Tue, Jun 16, 2009 at 12:38 AM, Landei <Daniel.Gronau@...> wrote:
>>
>>
>>
>> Ivan Tarasov wrote:
>> >
>> > I'm implementing some binary stream data parsing, and I approached it in
>> > the
>> > following way:
>> >
>> > 1. Data is read into Array[Byte] (when I have access to all the data I
>> > need).
>> > 2. The array is passed into a function deserialize(data: Seq[Byte]):
>> > Seq[Record], which parses the bytes into records.
>> > 3. deserialize function has a tail recursive function
>> > deserializeRecord(data: Seq[Byte], records: Queue[Record]): Seq[Record]
>> > inside which parses a single Record and enqueues it into the end of
>> > records
>> > queue (initially empty).
>> > 4. deserializeRecord tail-calls itself:
>> > deserializeRecord(data.drop(amountOfParsedBytes),
>> > records.enqueue(parsedRecord))
>> >
>> > The initial data array size is around 256kb, each record is around 6
>> > bytes.
>> > The first problem I found was that data.drop call was extremely slow,
>> > likely
>> > because a new array was created on each drop and the data was copied (I
>> > didn't expect that).
>> >
>> > Then I started passing a view of my initial array:
>> > deserialize(dataBytesArray.view). That increased the performance quite a
>> > bit
>> > (although it's still bad), but introduced another problem: I started
>> > getting
>> > StackOverflowError on calls to data in deserializeRecord when a lot of
>> > drops
>> > have been done.
>> >
>> > Example:
>> >
>> > Welcome to Scala version 2.8.0.r17998-b20090605125720 (Java HotSpot(TM)
>> >> Server VM, Java 1.6.0_12).
>> >> Type in expressions to have them evaluated.
>> >> Type :help for more information.
>> >>
>> >> scala> var a = new Array[Byte](200000).view
>> >> a: java.lang.Object with
>> >>
>> >> scala.collection.generic.MutableVectorView[Byte,scala.collection.mutable.Vector[Byte]]
>> >> = MutableVectorTemplate(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>> >> 0,
>> >> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>> >> 0,
>> >> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>> >> 0,
>> >> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
>> >> scala> (1 to 10000).foreach { i => a = a.drop(6) }
>> >>
>> >> scala> a
>> >> java.lang.StackOverflowError
>> >>     at
>> >>
>> >> scala.collection.generic.MutableVectorViewTemplate$$anon$2.stringPrefix(MutableVectorViewTemplate.scala:61)
>> >>     at
>> >>
>> >> scala.collection.generic.TraversableViewTemplate$Sliced$class.stringPrefix(TraversableViewTemplate.scala:54)
>> >>     at
>> >>
>> >> scala.collection.generic.MutableVectorViewTemplate$$anon$2.stringPrefix(MutableVectorViewTemplate.scala:61)
>> >>     at scala.collection....
>> >> scala> a.length
>> >> java.lang.StackOverflowError
>> >>     at
>> >>
>> >> scala.collection.generic.SequenceViewTemplate$Sliced$class.length(SequenceViewTemplate.scala:34)
>> >>     at
>> >>
>> >> scala.collection.generic.MutableVectorViewTemplate$$anon$2.length(MutableVectorViewTemplate.scala:61)
>> >>     at
>> >>
>> >> scala.collection.generic.SequenceViewTemplate$Sliced$class.length(SequenceViewTemplate.scala:34)
>> >>     at scala.collection.generic.MutableVectorViewTemp...
>> >> scala> a(1)
>> >> java.lang.StackOverflowError
>> >>     at
>> >>
>> >> scala.collection.generic.MutableVectorViewTemplate$$anon$2.apply(MutableVectorViewTemplate.scala:61)
>> >>     at
>> >>
>> >> scala.collection.generic.SequenceViewTemplate$Sliced$class.apply(SequenceViewTemplate.scala:36)
>> >>     at
>> >>
>> >> scala.collection.generic.MutableVectorViewTemplate$$anon$2.apply(MutableVectorViewTemplate.scala:61)
>> >>     at scala.collection.generic.SequenceViewTemplat...
>> >>
>> >
>> > As far as I understand, the slice and drop functions do not try to
>> > flatten
>> > the data access path, forwarding all the calls to the immediate parent
>> > slice
>> > instead of the original datastructure. This basically means that the
>> > slice
>> > is unusable and doesn't scale well (at least performance-wise).
>> >
>> > Ivan
>> >
>> >
>>
>> Hi Ivan,
>>
>> without more code it's hard to give you a hint. An obvious but ugly
>> solution
>> would be to drop nothing from the array but to add a running index. Or can
>> you change the type of the initial data structure to something more "drop
>> friendly" (e.g a queue)? Could you try out Scala 2.8 with the new
>> collections (which might solve the problem)?
>>
>> Privyet,
>> Daniel
>>
>>
>> --
>> View this message in context:
>> http://www.nabble.com/-scala--New-collections-%28slice-drop%29-problem-tp24040606p24049217.html
>> Sent from the Scala mailing list archive at Nabble.com.
>>
>
>

 « Return to Thread: [scala] New collections (slice/drop) problem