|
View:
New views
4 Messages
—
Rating Filter:
Alert me
|
|
|
[scala] New collections (slice/drop) problemI'm implementing some binary stream data parsing, and I approached it in the following way:
1. Data is read into Array[Byte] (when I have access to all the data I need). 2. The array is passed into a function deserialize(data: Seq[Byte]): Seq[Record], which parses the bytes into records. 3. deserialize function has a tail recursive function deserializeRecord(data: Seq[Byte], records: Queue[Record]): Seq[Record] inside which parses a single Record and enqueues it into the end of records queue (initially empty). 4. deserializeRecord tail-calls itself: deserializeRecord(data.drop(amountOfParsedBytes), records.enqueue(parsedRecord)) The initial data array size is around 256kb, each record is around 6 bytes. The first problem I found was that data.drop call was extremely slow, likely because a new array was created on each drop and the data was copied (I didn't expect that). Then I started passing a view of my initial array: deserialize(dataBytesArray.view). That increased the performance quite a bit (although it's still bad), but introduced another problem: I started getting StackOverflowError on calls to data in deserializeRecord when a lot of drops have been done. Example: Welcome to Scala version 2.8.0.r17998-b20090605125720 (Java HotSpot(TM) Server VM, Java 1.6.0_12). As far as I understand, the slice and drop functions do not try to flatten the data access path, forwarding all the calls to the immediate parent slice instead of the original datastructure. This basically means that the slice is unusable and doesn't scale well (at least performance-wise). Ivan |
|
|
Re: [scala] New collections (slice/drop) problemHi Ivan, without more code it's hard to give you a hint. An obvious but ugly solution would be to drop nothing from the array but to add a running index. Or can you change the type of the initial data structure to something more "drop friendly" (e.g a queue)? Could you try out Scala 2.8 with the new collections (which might solve the problem)? Privyet, Daniel |
|
|
Re: [scala] New collections (slice/drop) problemI'm using one of the recent development builds of 2.8.0 (so I'm using the new collections API). Do you suggest upgrading to the latest?
I don't like the idea of "not dropping and passing the index" — that's quite ugly and defeats the purpose of the APIs. To me it looks like a bug in the Scala collections classes, however I wanted to make sure that I'm not doing anything stupid (or missing some obvious not-ugly solution) before I submit a bug report. Ivan On Tue, Jun 16, 2009 at 12:38 AM, Landei <Daniel.Gronau@...> wrote:
|
|
|
Re: [scala] New collections (slice/drop) problemHi Ivan,
What you describe is a bug. View slices are supposed to compactify their start and end indices, so that you should not see a stack overflow. I'll look into it. Cheers -- Martin On Tue, Jun 23, 2009 at 8:29 PM, Ivan Tarasov<ivan.tarasov@...> wrote: > I'm using one of the recent development builds of 2.8.0 (so I'm using the > new collections API). Do you suggest upgrading to the latest? > > I don't like the idea of "not dropping and passing the index" — that's quite > ugly and defeats the purpose of the APIs. > > To me it looks like a bug in the Scala collections classes, however I wanted > to make sure that I'm not doing anything stupid (or missing some obvious > not-ugly solution) before I submit a bug report. > > Ivan > > On Tue, Jun 16, 2009 at 12:38 AM, Landei <Daniel.Gronau@...> wrote: >> >> >> >> Ivan Tarasov wrote: >> > >> > I'm implementing some binary stream data parsing, and I approached it in >> > the >> > following way: >> > >> > 1. Data is read into Array[Byte] (when I have access to all the data I >> > need). >> > 2. The array is passed into a function deserialize(data: Seq[Byte]): >> > Seq[Record], which parses the bytes into records. >> > 3. deserialize function has a tail recursive function >> > deserializeRecord(data: Seq[Byte], records: Queue[Record]): Seq[Record] >> > inside which parses a single Record and enqueues it into the end of >> > records >> > queue (initially empty). >> > 4. deserializeRecord tail-calls itself: >> > deserializeRecord(data.drop(amountOfParsedBytes), >> > records.enqueue(parsedRecord)) >> > >> > The initial data array size is around 256kb, each record is around 6 >> > bytes. >> > The first problem I found was that data.drop call was extremely slow, >> > likely >> > because a new array was created on each drop and the data was copied (I >> > didn't expect that). >> > >> > Then I started passing a view of my initial array: >> > deserialize(dataBytesArray.view). That increased the performance quite a >> > bit >> > (although it's still bad), but introduced another problem: I started >> > getting >> > StackOverflowError on calls to data in deserializeRecord when a lot of >> > drops >> > have been done. >> > >> > Example: >> > >> > Welcome to Scala version 2.8.0.r17998-b20090605125720 (Java HotSpot(TM) >> >> Server VM, Java 1.6.0_12). >> >> Type in expressions to have them evaluated. >> >> Type :help for more information. >> >> >> >> scala> var a = new Array[Byte](200000).view >> >> a: java.lang.Object with >> >> >> >> scala.collection.generic.MutableVectorView[Byte,scala.collection.mutable.Vector[Byte]] >> >> = MutableVectorTemplate(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, >> >> 0, >> >> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, >> >> 0, >> >> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, >> >> 0, >> >> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,... >> >> scala> (1 to 10000).foreach { i => a = a.drop(6) } >> >> >> >> scala> a >> >> java.lang.StackOverflowError >> >> at >> >> >> >> scala.collection.generic.MutableVectorViewTemplate$$anon$2.stringPrefix(MutableVectorViewTemplate.scala:61) >> >> at >> >> >> >> scala.collection.generic.TraversableViewTemplate$Sliced$class.stringPrefix(TraversableViewTemplate.scala:54) >> >> at >> >> >> >> scala.collection.generic.MutableVectorViewTemplate$$anon$2.stringPrefix(MutableVectorViewTemplate.scala:61) >> >> at scala.collection.... >> >> scala> a.length >> >> java.lang.StackOverflowError >> >> at >> >> >> >> scala.collection.generic.SequenceViewTemplate$Sliced$class.length(SequenceViewTemplate.scala:34) >> >> at >> >> >> >> scala.collection.generic.MutableVectorViewTemplate$$anon$2.length(MutableVectorViewTemplate.scala:61) >> >> at >> >> >> >> scala.collection.generic.SequenceViewTemplate$Sliced$class.length(SequenceViewTemplate.scala:34) >> >> at scala.collection.generic.MutableVectorViewTemp... >> >> scala> a(1) >> >> java.lang.StackOverflowError >> >> at >> >> >> >> scala.collection.generic.MutableVectorViewTemplate$$anon$2.apply(MutableVectorViewTemplate.scala:61) >> >> at >> >> >> >> scala.collection.generic.SequenceViewTemplate$Sliced$class.apply(SequenceViewTemplate.scala:36) >> >> at >> >> >> >> scala.collection.generic.MutableVectorViewTemplate$$anon$2.apply(MutableVectorViewTemplate.scala:61) >> >> at scala.collection.generic.SequenceViewTemplat... >> >> >> > >> > As far as I understand, the slice and drop functions do not try to >> > flatten >> > the data access path, forwarding all the calls to the immediate parent >> > slice >> > instead of the original datastructure. This basically means that the >> > slice >> > is unusable and doesn't scale well (at least performance-wise). >> > >> > Ivan >> > >> > >> >> Hi Ivan, >> >> without more code it's hard to give you a hint. An obvious but ugly >> solution >> would be to drop nothing from the array but to add a running index. Or can >> you change the type of the initial data structure to something more "drop >> friendly" (e.g a queue)? Could you try out Scala 2.8 with the new >> collections (which might solve the problem)? >> >> Privyet, >> Daniel >> >> >> -- >> View this message in context: >> http://www.nabble.com/-scala--New-collections-%28slice-drop%29-problem-tp24040606p24049217.html >> Sent from the Scala mailing list archive at Nabble.com. >> > > |
| Free embeddable forum powered by Nabble | Forum Help |