« Return to Thread: [scala] New collections (slice/drop) problem

[scala] New collections (slice/drop) problem

by Ivan Tarasov :: Rate this Message:

Reply to Author | View in Thread

I'm implementing some binary stream data parsing, and I approached it in the following way:

1. Data is read into Array[Byte] (when I have access to all the data I need).
2. The array is passed into a function deserialize(data: Seq[Byte]): Seq[Record], which parses the bytes into records.
3. deserialize function has a tail recursive function deserializeRecord(data: Seq[Byte], records: Queue[Record]): Seq[Record] inside which parses a single Record and enqueues it into the end of records queue (initially empty).
4. deserializeRecord tail-calls itself: deserializeRecord(data.drop(amountOfParsedBytes), records.enqueue(parsedRecord))

The initial data array size is around 256kb, each record is around 6 bytes. The first problem I found was that data.drop call was extremely slow, likely because a new array was created on each drop and the data was copied (I didn't expect that).

Then I started passing a view of my initial array: deserialize(dataBytesArray.view). That increased the performance quite a bit (although it's still bad), but introduced another problem: I started getting StackOverflowError on calls to data in deserializeRecord when a lot of drops have been done.

Example:

Welcome to Scala version 2.8.0.r17998-b20090605125720 (Java HotSpot(TM) Server VM, Java 1.6.0_12).
Type in expressions to have them evaluated.
Type :help for more information.

scala> var a = new Array[Byte](200000).view
a: java.lang.Object with scala.collection.generic.MutableVectorView[Byte,scala.collection.mutable.Vector[Byte]] = MutableVectorTemplate(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
scala> (1 to 10000).foreach { i => a = a.drop(6) }

scala> a
java.lang.StackOverflowError
    at scala.collection.generic.MutableVectorViewTemplate$$anon$2.stringPrefix(MutableVectorViewTemplate.scala:61)
    at scala.collection.generic.TraversableViewTemplate$Sliced$class.stringPrefix(TraversableViewTemplate.scala:54)
    at scala.collection.generic.MutableVectorViewTemplate$$anon$2.stringPrefix(MutableVectorViewTemplate.scala:61)
    at scala.collection....
scala> a.length
java.lang.StackOverflowError
    at scala.collection.generic.SequenceViewTemplate$Sliced$class.length(SequenceViewTemplate.scala:34)
    at scala.collection.generic.MutableVectorViewTemplate$$anon$2.length(MutableVectorViewTemplate.scala:61)
    at scala.collection.generic.SequenceViewTemplate$Sliced$class.length(SequenceViewTemplate.scala:34)
    at scala.collection.generic.MutableVectorViewTemp...
scala> a(1)
java.lang.StackOverflowError
    at scala.collection.generic.MutableVectorViewTemplate$$anon$2.apply(MutableVectorViewTemplate.scala:61)
    at scala.collection.generic.SequenceViewTemplate$Sliced$class.apply(SequenceViewTemplate.scala:36)
    at scala.collection.generic.MutableVectorViewTemplate$$anon$2.apply(MutableVectorViewTemplate.scala:61)
    at scala.collection.generic.SequenceViewTemplat...

As far as I understand, the slice and drop functions do not try to flatten the data access path, forwarding all the calls to the immediate parent slice instead of the original datastructure. This basically means that the slice is unusable and doesn't scale well (at least performance-wise).

Ivan

 « Return to Thread: [scala] New collections (slice/drop) problem