« Return to Thread: [scala] New collections (slice/drop) problem
Hi Ivan,
Ivan Tarasov wrote:
>
> I'm implementing some binary stream data parsing, and I approached it in
> the
> following way:
>
> 1. Data is read into Array[Byte] (when I have access to all the data I
> need).
> 2. The array is passed into a function deserialize(data: Seq[Byte]):
> Seq[Record], which parses the bytes into records.
> 3. deserialize function has a tail recursive function
> deserializeRecord(data: Seq[Byte], records: Queue[Record]): Seq[Record]
> inside which parses a single Record and enqueues it into the end of
> records
> queue (initially empty).
> 4. deserializeRecord tail-calls itself:
> deserializeRecord(data.drop(amountOfParsedBytes),
> records.enqueue(parsedRecord))
>
> The initial data array size is around 256kb, each record is around 6
> bytes.
> The first problem I found was that data.drop call was extremely slow,
> likely
> because a new array was created on each drop and the data was copied (I
> didn't expect that).
>
> Then I started passing a view of my initial array:
> deserialize(dataBytesArray.view). That increased the performance quite a
> bit
> (although it's still bad), but introduced another problem: I started
> getting
> StackOverflowError on calls to data in deserializeRecord when a lot of
> drops
> have been done.
>
> Example:
>
> Welcome to Scala version 2.8.0.r17998-b20090605125720 (Java HotSpot(TM)
>> Server VM, Java 1.6.0_12).
>> Type in expressions to have them evaluated.
>> Type :help for more information.
>>
>> scala> var a = new Array[Byte](200000).view
>> a: java.lang.Object with
>> scala.collection.generic.MutableVectorView[Byte,scala.collection.mutable.Vector[Byte]]
>> = MutableVectorTemplate(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>> 0,
>> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>> 0,
>> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>> 0,
>> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
>> scala> (1 to 10000).foreach { i => a = a.drop(6) }
>>
>> scala> a
>> java.lang.StackOverflowError
>> at
>> scala.collection.generic.MutableVectorViewTemplate$$anon$2.stringPrefix(MutableVectorViewTemplate.scala:61)
>> at
>> scala.collection.generic.TraversableViewTemplate$Sliced$class.stringPrefix(TraversableViewTemplate.scala:54)
>> at
>> scala.collection.generic.MutableVectorViewTemplate$$anon$2.stringPrefix(MutableVectorViewTemplate.scala:61)
>> at scala.collection....
>> scala> a.length
>> java.lang.StackOverflowError
>> at
>> scala.collection.generic.SequenceViewTemplate$Sliced$class.length(SequenceViewTemplate.scala:34)
>> at
>> scala.collection.generic.MutableVectorViewTemplate$$anon$2.length(MutableVectorViewTemplate.scala:61)
>> at
>> scala.collection.generic.SequenceViewTemplate$Sliced$class.length(SequenceViewTemplate.scala:34)
>> at scala.collection.generic.MutableVectorViewTemp...
>> scala> a(1)
>> java.lang.StackOverflowError
>> at
>> scala.collection.generic.MutableVectorViewTemplate$$anon$2.apply(MutableVectorViewTemplate.scala:61)
>> at
>> scala.collection.generic.SequenceViewTemplate$Sliced$class.apply(SequenceViewTemplate.scala:36)
>> at
>> scala.collection.generic.MutableVectorViewTemplate$$anon$2.apply(MutableVectorViewTemplate.scala:61)
>> at scala.collection.generic.SequenceViewTemplat...
>>
>
> As far as I understand, the slice and drop functions do not try to flatten
> the data access path, forwarding all the calls to the immediate parent
> slice
> instead of the original datastructure. This basically means that the slice
> is unusable and doesn't scale well (at least performance-wise).
>
> Ivan
>
>
without more code it's hard to give you a hint. An obvious but ugly solution
would be to drop nothing from the array but to add a running index. Or can
you change the type of the initial data structure to something more "drop
friendly" (e.g a queue)? Could you try out Scala 2.8 with the new
collections (which might solve the problem)?
Privyet,
Daniel
--
View this message in context: http://www.nabble.com/-scala--New-collections-%28slice-drop%29-problem-tp24040606p24049217.html
Sent from the Scala mailing list archive at Nabble.com.
« Return to Thread: [scala] New collections (slice/drop) problem
| Free embeddable forum powered by Nabble | Forum Help |