« Return to Thread: [scala] New collections (slice/drop) problem
Hi Ivan,Ivan Tarasov wrote:I'm implementing some binary stream data parsing, and I approached it in the
following way:
1. Data is read into Array[Byte] (when I have access to all the data I
need).
2. The array is passed into a function deserialize(data: Seq[Byte]):
Seq[Record], which parses the bytes into records.
3. deserialize function has a tail recursive function
deserializeRecord(data: Seq[Byte], records: Queue[Record]): Seq[Record]
inside which parses a single Record and enqueues it into the end of records
queue (initially empty).
4. deserializeRecord tail-calls itself:
deserializeRecord(data.drop(amountOfParsedBytes),
records.enqueue(parsedRecord))
The initial data array size is around 256kb, each record is around 6 bytes.
The first problem I found was that data.drop call was extremely slow, likely
because a new array was created on each drop and the data was copied (I
didn't expect that).
Then I started passing a view of my initial array:
deserialize(dataBytesArray.view). That increased the performance quite a bit
(although it's still bad), but introduced another problem: I started getting
StackOverflowError on calls to data in deserializeRecord when a lot of drops
have been done.
Example:
Welcome to Scala version 2.8.0.r17998-b20090605125720 (Java HotSpot(TM)
> Server VM, Java 1.6.0_12).
> Type in expressions to have them evaluated.
> Type :help for more information.
>
> scala> var a = new Array[Byte](200000).view
> a: java.lang.Object with
> scala.collection.generic.MutableVectorView[Byte,scala.collection.mutable.Vector[Byte]]
> = MutableVectorTemplate(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
> scala> (1 to 10000).foreach { i => a = a.drop(6) }
>
> scala> a
> java.lang.StackOverflowError
> at
> scala.collection.generic.MutableVectorViewTemplate$$anon$2.stringPrefix(MutableVectorViewTemplate.scala:61)
> at
> scala.collection.generic.TraversableViewTemplate$Sliced$class.stringPrefix(TraversableViewTemplate.scala:54)
> at
> scala.collection.generic.MutableVectorViewTemplate$$anon$2.stringPrefix(MutableVectorViewTemplate.scala:61)
> at scala.collection....
> scala> a.length
> java.lang.StackOverflowError
> at
> scala.collection.generic.SequenceViewTemplate$Sliced$class.length(SequenceViewTemplate.scala:34)
> at
> scala.collection.generic.MutableVectorViewTemplate$$anon$2.length(MutableVectorViewTemplate.scala:61)
> at
> scala.collection.generic.SequenceViewTemplate$Sliced$class.length(SequenceViewTemplate.scala:34)
> at scala.collection.generic.MutableVectorViewTemp...
> scala> a(1)
> java.lang.StackOverflowError
> at
> scala.collection.generic.MutableVectorViewTemplate$$anon$2.apply(MutableVectorViewTemplate.scala:61)
> at
> scala.collection.generic.SequenceViewTemplate$Sliced$class.apply(SequenceViewTemplate.scala:36)
> at
> scala.collection.generic.MutableVectorViewTemplate$$anon$2.apply(MutableVectorViewTemplate.scala:61)
> at scala.collection.generic.SequenceViewTemplat...
>
As far as I understand, the slice and drop functions do not try to flatten
the data access path, forwarding all the calls to the immediate parent slice
instead of the original datastructure. This basically means that the slice
is unusable and doesn't scale well (at least performance-wise).
Ivan
« Return to Thread: [scala] New collections (slice/drop) problem
| Free embeddable forum powered by Nabble | Forum Help |