« Return to Thread: [scala] New collections (slice/drop) problem

Re: [scala] New collections (slice/drop) problem

by Landei :: Rate this Message:

Reply to Author | View in Thread


Ivan Tarasov wrote:
I'm implementing some binary stream data parsing, and I approached it in the
following way:

1. Data is read into Array[Byte] (when I have access to all the data I
need).
2. The array is passed into a function deserialize(data: Seq[Byte]):
Seq[Record], which parses the bytes into records.
3. deserialize function has a tail recursive function
deserializeRecord(data: Seq[Byte], records: Queue[Record]): Seq[Record]
inside which parses a single Record and enqueues it into the end of records
queue (initially empty).
4. deserializeRecord tail-calls itself:
deserializeRecord(data.drop(amountOfParsedBytes),
records.enqueue(parsedRecord))

The initial data array size is around 256kb, each record is around 6 bytes.
The first problem I found was that data.drop call was extremely slow, likely
because a new array was created on each drop and the data was copied (I
didn't expect that).

Then I started passing a view of my initial array:
deserialize(dataBytesArray.view). That increased the performance quite a bit
(although it's still bad), but introduced another problem: I started getting
StackOverflowError on calls to data in deserializeRecord when a lot of drops
have been done.

Example:

Welcome to Scala version 2.8.0.r17998-b20090605125720 (Java HotSpot(TM)
> Server VM, Java 1.6.0_12).
> Type in expressions to have them evaluated.
> Type :help for more information.
>
> scala> var a = new Array[Byte](200000).view
> a: java.lang.Object with
> scala.collection.generic.MutableVectorView[Byte,scala.collection.mutable.Vector[Byte]]
> = MutableVectorTemplate(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
> scala> (1 to 10000).foreach { i => a = a.drop(6) }
>
> scala> a
> java.lang.StackOverflowError
>     at
> scala.collection.generic.MutableVectorViewTemplate$$anon$2.stringPrefix(MutableVectorViewTemplate.scala:61)
>     at
> scala.collection.generic.TraversableViewTemplate$Sliced$class.stringPrefix(TraversableViewTemplate.scala:54)
>     at
> scala.collection.generic.MutableVectorViewTemplate$$anon$2.stringPrefix(MutableVectorViewTemplate.scala:61)
>     at scala.collection....
> scala> a.length
> java.lang.StackOverflowError
>     at
> scala.collection.generic.SequenceViewTemplate$Sliced$class.length(SequenceViewTemplate.scala:34)
>     at
> scala.collection.generic.MutableVectorViewTemplate$$anon$2.length(MutableVectorViewTemplate.scala:61)
>     at
> scala.collection.generic.SequenceViewTemplate$Sliced$class.length(SequenceViewTemplate.scala:34)
>     at scala.collection.generic.MutableVectorViewTemp...
> scala> a(1)
> java.lang.StackOverflowError
>     at
> scala.collection.generic.MutableVectorViewTemplate$$anon$2.apply(MutableVectorViewTemplate.scala:61)
>     at
> scala.collection.generic.SequenceViewTemplate$Sliced$class.apply(SequenceViewTemplate.scala:36)
>     at
> scala.collection.generic.MutableVectorViewTemplate$$anon$2.apply(MutableVectorViewTemplate.scala:61)
>     at scala.collection.generic.SequenceViewTemplat...
>

As far as I understand, the slice and drop functions do not try to flatten
the data access path, forwarding all the calls to the immediate parent slice
instead of the original datastructure. This basically means that the slice
is unusable and doesn't scale well (at least performance-wise).

Ivan
Hi Ivan,

without more code it's hard to give you a hint. An obvious but ugly solution would be to drop nothing from the array but to add a running index. Or can you change the type of the initial data structure to something more "drop friendly" (e.g a queue)? Could you try out Scala 2.8 with the new collections (which might solve the problem)?

Privyet,
Daniel

 « Return to Thread: [scala] New collections (slice/drop) problem