|
View:
New views
11 Messages
—
Rating Filter:
Alert me
|
| < Prev | 1 - 2 | Next > |
|
|
Re: [scala] Scala Actors - no speedup?On Tuesday 25 September 2007 15:14, you wrote:
> On 9/26/07, Randall R Schulz <rschulz@...> wrote: > > On Tuesday 25 September 2007 14:13, Akhilesh Mritunjai wrote: > > > ... > > > > > > So *even* if the whole thing is done in a single thread, the > > > performance is around 1 gig/s... so there is absolutely no point > > > parallelizing this thing. Even my 3 yr old processor (let alone a > > > more modern Core2Duo@2GHz) would easily saturate ANY potential > > > data stream you can supply... a typical HDD doesn't go over 80 > > > MB/s. > > > > "Typical" personal / desktop computers, maybe. > > > > But servers today often use 15,000 RPM drives connected by > > Ultra-320 SCSI busses. > > Their typical performance is around 100 MB/s. You're also ignore RAID and OS- / file-system-driven read-ahead. > > Furthermore, the first flash-RAM-based mass storage devices (as in > > disk replacements) are now becoming available. I don't know what > > their performance characteristics are, but it's time for software > > engineers > > Their typical performance is even less than rotating magnetic disks > and max out around 80 MB/s. So far. This is a very new technology, and will undoubtedly improve. I think it's reasonable to expect it to exceed electromechanical storage before very long. > Both of them contribute nothing to streaming data bandwidth, both > only lower the disk lantencies (2ms typical for 15k rpm disks, few uS > for flash). Keep in mind, too, that optimal file system design will change when there are no longer any appreciable latencies associated with rotation or head movement. > ... > > So unfortunately, IO bottlenecks ain't going anywhere soon. Only really true when the processing costs per byte are very low. I don't consider that typical, either. > - Akhilesh Randall Schulz |
|
|
Re: [scala] Scala Actors - no speedup?Akhilesh Mritunjai wrote: > > Can you just whip up a sample code in Scala that just reads the file > (and throws away) ? > > I/O, specially in higher level languages, does consume CPU cycles > because fetching the data itself requires CPU cycles. That is why Sun > x4500 (Thumper) has 2X dual core opteron CPUs in there even though > it's supposed to be a file server... sustaining 2GB/s (max it is > capable of) requires serious CPU horse power. > > Your observation of introducing the regexp processing increasing the > time can also be explained with this as regexp is basically sharing > CPU with IO operations... which themselves require CPU. > > The thing to really worry about in this benchmark is that IO in Scala > *can* potentially have so much overhead !! This is something I want to > explore further. > is a very high cost operation, especially in the JVM where characters are 2 bytes each. The one time I saw Ruby outperform Java was reading log files and applying a RegEx to each line. It was a very simple program and Ruby did not need to do the byte -> char conversion and beat the JVM by 5% or so. > - Akhilesh > |
|
|
Re: [scala] Scala Actors - no speedup?On 9/26/07, David Pollak <dpp@...> wrote:
> Akhilesh Mritunjai wrote: > > The thing to really worry about in this benchmark is that IO in Scala > > *can* potentially have so much overhead !! This is something I want to > > explore further. > > > One other factor is the conversion of byte streams to characters. This > is a very high cost operation, especially in the JVM where characters > are 2 bytes each. Indeed. It looks like the culprit, but I'm still not sure it is the whole story. Character conversion can't be *so* slow that it pegs a 2GHz Core2Duo CPU at 100% while doing roughly around 50-200MB/s (accounting for buffering etc). It just seems.... wrong. > The one time I saw Ruby outperform Java was reading log files and > applying a RegEx to each line. It was a very simple program and Ruby > did not need to do the byte -> char conversion and beat the JVM by 5% or so. Interesting. I do think that it'd be better in that case (to compare apples/apples) to use new nio routines to stuff regexp engine with data... double points if the Java guys had also used memory mapped IO routines in nio. Ruby isn't (yet) unicode aware so then itwould have been _some_ fair benchmark. Umm, btw, do Scala IO routines use nio or just java.io ? - Akhilesh |
|
|
Re: [scala] Scala Actors - no speedup?On 2007-09-25 15:31:26 David Pollak wrote:
> One other factor is the conversion of byte streams to characters. > This is a very high cost operation, especially in the JVM where > characters are 2 bytes each. > > The one time I saw Ruby outperform Java was reading log files and > applying a RegEx to each line. It was a very simple program and Ruby > did not need to do the byte -> char conversion and beat the JVM by 5% > or so. It might be worth investigating the brics.dk automaton code. As well as being much faster than Java regexes (at the expense of features), it should IIRC be fairly easy to modify it to work directly on mmapped ByteBuffers (since I presume most regexes aren't interested in non-ASCII characters). /J |
|
|
Re: [scala] Scala Actors - no speedup?Just for some addditional reading:
http://patricklogan.blogspot.com/2007/09/regular-expression-matching-can-be.html On 9/26/07, Akhilesh Mritunjai <mritun@...> wrote: On 9/26/07, David Pollak <dpp@...> wrote: |
|
|
|
|
|
Re: [scala] Scala Actors - no speedup?On 9/25/07, David Pollak <dpp@...> wrote:
> Sorry... should have read your blog post... > > The first problem is your hardware... the hard drive in the MacBook Pro > is a dog. > > The second problem is your OS... the Mach Kernel is notoriously bad for > doing IO... Linus has an occasional rant about this. My MacBook Pro has > something like 1/2 the disk read performance of my ThinkPad T41 running > Ubuntu (same disk rotation speed and the TPad has an IDE drive, not SATA.) > > My guess is that you're spending nearly a whole CPU doing disk IO... you > might try a larger file and see how much of your time is spent in System > vs. User. Yes, I think you've uncovered a key problem with this kind of experiment: multiple cores won't help if you're IO bound. When this is a problem, not only should your computation be distributed but so should the data being processed. Keeping your logfiles in a distributed file system like NDFS will make distributing the workload more fruitful. Alternatively, if you have a small enough number of logfiles, you could take the IO hit up front and perform the work in stages: 1) send logfile/N to each machine in N 2) process logfile chunk N on machine N 3) aggregate the results For an example, take a look at Sawzall[1]; an external DSL built on top of MapReduce utilizing GFS for logfile processing. Also, Yahoo Pig[2] is an open source research project using a distributed file system and an open source clone of mapreduce to do logfile processing. Bringing this back to Scala, it shouldn't be rocket science to build something like mapreduce on top of scala remote actors. I have a prototype built using rabbitmq instead of remote actors. Soon, I'll clear a few days off the calendar to clean it up and release it. Steve [1]: http://labs.google.com/papers/sawzall-sciprog.pdf [2]: http://research.yahoo.com/project/pig |
|
|
Re: [scala] Scala Actors - no speedup?> Can you just whip up a sample code in Scala that just reads the file
> (and throws away) ? I did. It takes 12 seconds to just go through the file (with splitting it into lines), compared to 18 seconds when matching regular expressions. Martin |
|
|
Re: [scala] Scala Actors - no speedup?It seems to me, to make the benchmarks more useful, we have to clear
up whether Unicode support is needed or not in Tim's Wide Finder. If not, one can work with the byte-oriented Java IO classes and some amount of memory and processing time should be shaved off. The big downside is that many of the useful Java libraries for doing things like Regex are character- or String-based. One would need a byte-oriented version of the same libraries, or else hand-code it. Would be interesting to see the difference in the two approaches (byte versus character), in any case. A separate issue is whether the matching can be done against a stream, or against a model (String) of the data in-memory. Patrick (lurking) |
|
|
Re: [scala] Scala Actors - no speedup?Tim's example processes webserver log files, and I think *never* has
to deal with anything except plain ASCII. Scala/Java programs *will* run faster with byte IO, but imho, there is no need for that. All it'd take is an hour to code something that uses byte stream IO & converts from byte to char /really/ fast (because it'd know that there is no unicode character). But I think it's all pointless, because we may think that this thing is letting us down in benchmarks, but I think it'd take them much longer than an hour to do a proper unicode application in ruby/perl/whatever. On 9/26/07, Patrick Wright <pdoubleya@...> wrote: > It seems to me, to make the benchmarks more useful, we have to clear > up whether Unicode support is needed or not in Tim's Wide Finder. If > not, one can work with the byte-oriented Java IO classes and some > amount of memory and processing time should be shaved off. The big > downside is that many of the useful Java libraries for doing things > like Regex are character- or String-based. One would need a > byte-oriented version of the same libraries, or else hand-code it. > Would be interesting to see the difference in the two approaches (byte > versus character), in any case. > > A separate issue is whether the matching can be done against a stream, > or against a model (String) of the data in-memory. > > > Patrick (lurking) > |
|
|
Re: [scala] Scala Actors - no speedup?On Tuesday 25 September 2007 15:01, Randall R Schulz wrote:
> ... > > Furthermore, the first flash-RAM-based mass storage devices (as in > disk replacements) are now becoming available. I don't know what > their performance characteristics are, but it's time for software > engineers to start preparing for computing hardware that is, > mercifully, free of rotating electromechanical mass storage. > Hallelujah! Anyone interested in the near-term future of non-magnetic, no-moving-parts storage might want to check out this article from Tom's Hardware Guide: <http://www.tgdaily.com/content/view/34065/118/> Most interesting is the third slide from the sequence starting at <http://www.tgdaily.com/picturegalleries/gallery-20070926.html>: Randall Schulz |
| < Prev | 1 - 2 | Next > |
| Free embeddable forum powered by Nabble | Forum Help |