|
View:
New views
14 Messages
—
Rating Filter:
Alert me
|
|
|
Why does this program eat RAM?I have the below program, and I'm trying to run it on an input of about
90MB. It eats RAM like crazy, and I can't figure out why. I do know that the problem is not my custwords function (as you can see, I replaced the call to it with a call to the standard words function on the last line). It seems to be wordfreq, but I don't know why. For information about the problem, see http://changelog.complete.org/posts/535-A-Haskell-solution-to-Lars-Problem.html Here's the code: import System.Environment import Data.List import Data.Char import qualified Data.Map as Map custwords = filter (/= "") . lines . map (conv . toLower) where iswordchar x = isAlphaNum x && isAscii x conv x = if iswordchar x then x else '\n' wordfreq inp = Map.toList $ foldl' updatemap (Map.empty::Map.Map String Int) inp where updatemap nm word = Map.insertWith updatefunc word 1 nm updatefunc _ x = x + 1 freqsort (w1, c1) (w2, c2) = if c1 == c2 then compare w1 w2 else compare c2 c1 showit (word, count) = show count ++ " " ++ word main = do args <- getArgs interact $ unlines . map showit . take (read . head $ args) . sortBy freqsort . wordfreq . words _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@... http://www.haskell.org/mailman/listinfo/haskell-cafe |
|
|
Re: Why does this program eat RAM?The x+1 looks suspicious.
On Sep 4, 2006, at 23:03 , John Goerzen wrote: > I have the below program, and I'm trying to run it on an input of > about > 90MB. It eats RAM like crazy, and I can't figure out why. > > I do know that the problem is not my custwords function (as you can > see, > I replaced the call to it with a call to the standard words > function on > the last line). It seems to be wordfreq, but I don't know why. > > For information about the problem, see > http://changelog.complete.org/posts/535-A-Haskell-solution-to-Lars- > Problem.html > > Here's the code: > > import System.Environment > import Data.List > import Data.Char > import qualified Data.Map as Map > > custwords = filter (/= "") . lines . map (conv . toLower) > where iswordchar x = isAlphaNum x && isAscii x > conv x = if iswordchar x then x else '\n' > > wordfreq inp = Map.toList $ foldl' updatemap (Map.empty::Map.Map > String Int) inp > where updatemap nm word = Map.insertWith updatefunc word 1 nm > updatefunc _ x = x + 1 > > freqsort (w1, c1) (w2, c2) = if c1 == c2 > then compare w1 w2 > else compare c2 c1 > > showit (word, count) = show count ++ " " ++ word > main = do args <- getArgs > interact $ unlines . map showit . take (read . head $ > args) . > sortBy freqsort . wordfreq . words > > > _______________________________________________ > Haskell-Cafe mailing list > Haskell-Cafe@... > http://www.haskell.org/mailman/listinfo/haskell-cafe _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@... http://www.haskell.org/mailman/listinfo/haskell-cafe |
|
|
Re: Why does this program eat RAM?At Tue, 5 Sep 2006 03:03:51 +0000 (UTC),
John Goerzen wrote: > > I have the below program, and I'm trying to run it on an input of about > 90MB. It eats RAM like crazy, and I can't figure out why. I have not looked in detail at your code -- but it could simply be the fact that String requires gobs of memory to store a string. If you forced all 90MB into memory at once, I would expect it to take almost of gig of RAM. (Around a 10-11 fold increase in size). I suspect this line could be forcing the whole thing into memory: > wordfreq = map (\x -> (head x, length x)) . group . sort because sort can not return the first element until it has looked at all the elements in the list to determine which one should be first. If you fold a Data.Map or associative list over the word-list, then you could probably get the lazy behaviour you expect. j. _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@... http://www.haskell.org/mailman/listinfo/haskell-cafe |
|
|
Re: Why does this program eat RAM?jeremy.shaw:
> At Tue, 5 Sep 2006 03:03:51 +0000 (UTC), > John Goerzen wrote: > > > > I have the below program, and I'm trying to run it on an input of about > > 90MB. It eats RAM like crazy, and I can't figure out why. > > I have not looked in detail at your code -- but it could simply be the > fact that String requires gobs of memory to store a string. If you > forced all 90MB into memory at once, I would expect it to take almost > of gig of RAM. (Around a 10-11 fold increase in size). > > I suspect this line could be forcing the whole thing into memory: > > > wordfreq = map (\x -> (head x, length x)) . group . sort > > because sort can not return the first element until it has looked at > all the elements in the list to determine which one should be first. > > If you fold a Data.Map or associative list over the word-list, then > you could probably get the lazy behaviour you expect. A quick hack up to use Data.ByteString uses a lot less ram, though profiling still shows 95% of time spent in the building the Map. import System.Environment import Data.Char import Data.List import qualified Data.Map as Map import qualified Data.ByteString.Char8 as B import Data.ByteString (ByteString) wordfreq inp = Map.toList $ foldl' k m inp where m = Map.empty :: Map.Map ByteString Int k n w = Map.insertWith f w 1 n f _ x = let y = x + 1 in y `seq` y freqsort (w1, c1) (w2, c2) | c1 == c2 = compare w1 w2 | otherwise = compare c2 c1 showit (w, c) = B.join (B.singleton ' ') [B.pack(show c), w] main :: IO () main = do args <- getArgs B.interact $ B.unlines . map showit . take (read . head $ args) . sortBy freqsort . wordfreq . B.words -- Don _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@... http://www.haskell.org/mailman/listinfo/haskell-cafe |
|
|
Re: Why does this program eat RAM?At Mon, 04 Sep 2006 22:05:57 -0700,
Jeremy Shaw wrote: > > At Tue, 5 Sep 2006 03:03:51 +0000 (UTC), > John Goerzen wrote: > > > > I have the below program, and I'm trying to run it on an input of about > > 90MB. It eats RAM like crazy, and I can't figure out why. > If you fold a Data.Map or associative list over the word-list, then > you could probably get the lazy behaviour you expect. Oops, I got distracted and looked at the version on the top of the site, not your email :) Anyway, I will still consider the possibility that the hold String is being forced into memory. j. _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@... http://www.haskell.org/mailman/listinfo/haskell-cafe |
|
|
Re: Why does this program eat RAM?Donald Bruce Stewart wrote:
> A quick hack up to use Data.ByteString uses a lot less ram, though > profiling still shows 95% of time spent in the building the Map. Nice! > k n w = Map.insertWith f w 1 n > f _ x = let y = x + 1 in y `seq` y y `seq` y is semantically equivalent to y though. The strictness would have to be built into insertWith to make this work. regards, Bertram _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@... http://www.haskell.org/mailman/listinfo/haskell-cafe |
|
|
Re[2]: Why does this program eat RAM?Hello Bertram,
Tuesday, September 5, 2006, 12:24:57 PM, you wrote: >> A quick hack up to use Data.ByteString uses a lot less ram, though >> profiling still shows 95% of time spent in the building the Map. Data.HashTable may be a faster alternative for Map (if ordering isn't required) -- Best regards, Bulat mailto:Bulat.Ziganshin@... _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@... http://www.haskell.org/mailman/listinfo/haskell-cafe |
|
|
Re: Why does this program eat RAM?John Goerzen wrote:
> I have the below program, and I'm trying to run it on an input of about > 90MB. It eats RAM like crazy, and I can't figure out why. > > wordfreq inp = Map.toList $ foldl' updatemap (Map.empty::Map.Map String Int) inp > where updatemap nm word = Map.insertWith updatefunc word 1 nm > updatefunc _ x = x + 1 The culprit is insertWith, it inserts unevaluated thunks into your map where you want a simple value. To avoid a space leak, you want a strict update function (yours is strict enough) and insertWith must be strict in the newly inserted value (the result of applying updatefunc). Since you cannot influence the strictness of insertWith, no matter how many seqs you sprinkle through your code, you need insertWith', which is missing. You can simulate it, however: insertWith' f k v m = case Map.lookup k m of Nothing -> Map.insert k v m Just w -> (Map.insert k $! f w v) m IMHO all accumulating functions, especially foldl, State.update, Map.insertWith, accumArray, absolutely need a strict version, because the strictness cannot be recovered by the library's user. If the clutter of too many primed names is unbearable, leave out the _lazy_ version. It's useless IME and lazyness can be recovered if the need arises. Udo. -- Wo die Macht geistlos ist, ist der Geist machtlos. -- aus einem Gipfelbuch _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@... http://www.haskell.org/mailman/listinfo/haskell-cafe |
|
|
Re: Why does this program eat RAM?Bulat Ziganshin wrote:
> Hello Bertram, > > Tuesday, September 5, 2006, 12:24:57 PM, you wrote: > >>> A quick hack up to use Data.ByteString uses a lot less ram, though >>> profiling still shows 95% of time spent in the building the Map. > > Data.HashTable may be a faster alternative for Map (if ordering isn't > required) > > I found Data.HashTable a bit slow (ghc 6.4). Perhaps HsJudy (see http://cmarcelo.blogspot.com/ and http://judy.sourceforge.net/ and http://www.mail-archive.com/haskell@.../msg18766.html ) _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@... http://www.haskell.org/mailman/listinfo/haskell-cafe |
|
|
Re: Re[2]: Why does this program eat RAM?Bulat Ziganshin wrote:
> Data.HashTable may be a faster alternative for Map (if ordering isn't > required) Or it may not. Finding a good hash function for the words John is counting, is a challenge itself. Finding a good one that doesn't look at each character at least once, might be outright impossible. That means, a hash table cannot do significantly less work than the appropriate data structure, which is a trie, aka Data.StringMap. Udo. -- Q: Why do mountain climbers rope themselves together? A: To prevent the sensible ones from going home. _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@... http://www.haskell.org/mailman/listinfo/haskell-cafe |
|
|
Re: Why does this program eat RAM?On 2006-09-05, Udo Stenzel <u.stenzel@...> wrote:
> The culprit is insertWith, it inserts unevaluated thunks into your map This turned out to be the answer -- thanks! I posted a new version of the code here: http://changelog.complete.org/posts/536-Another-Haskell-Solution-to-Lars-Problem.html This particular test set was a few hundred copies of the GPL. So the Map was fairly small, since each word occured many, many times -- but there weren't a whole lot of words. So the problem was not the inefficiency of Data.Map, nor the inefficiency of Strings (though that inefficiency does explain why the Python solution is faster, I'm sure). -- John _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@... http://www.haskell.org/mailman/listinfo/haskell-cafe |
|
|
Re: Re[2]: Why does this program eat RAM?On 2006-09-05, Bulat Ziganshin <bulat.ziganshin@...> wrote:
> Hello Bertram, > > Tuesday, September 5, 2006, 12:24:57 PM, you wrote: > >>> A quick hack up to use Data.ByteString uses a lot less ram, though >>> profiling still shows 95% of time spent in the building the Map. > > Data.HashTable may be a faster alternative for Map (if ordering isn't > required) Indeed ordering wasn't required, but as HashTable lives in the IO monad, it's not, well, very "Haskellish". -- John _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@... http://www.haskell.org/mailman/listinfo/haskell-cafe |
|
|
Re: Why does this program eat RAM?On Tue, Sep 05, 2006 at 12:55:48PM +0200, Udo Stenzel wrote:
> IMHO all accumulating functions, especially foldl, State.update, > Map.insertWith, accumArray, absolutely need a strict version, because > the strictness cannot be recovered by the library's user. We already have foldl'. Here's a strict version of fmap: import Control.Applicative import Data.Traversable newtype Strict a = Strict a getStrict (Strict x) = x instance Functor Strict where fmap f (Strict x) = Strict (f x) -- doesn't quite satisfy the Applicative laws instance Applicative Strict where pure x = Strict x Strict f <*> Strict x = Strict (f $! x) fmap' :: Traversable f => (a -> b) -> f a -> f b fmap' f t = getStrict (traverse (Strict . f) t) _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@... http://www.haskell.org/mailman/listinfo/haskell-cafe |
|
|
Re: Why does this program eat RAM?On Sep 5, 2006, at 7:05 AM, Chris Kuklewicz wrote: > Bulat Ziganshin wrote: >> Hello Bertram, >> Tuesday, September 5, 2006, 12:24:57 PM, you wrote: >>>> A quick hack up to use Data.ByteString uses a lot less ram, though >>>> profiling still shows 95% of time spent in the building the Map. >> Data.HashTable may be a faster alternative for Map (if ordering isn't >> required) > > I found Data.HashTable a bit slow (ghc 6.4). Perhaps HsJudy (see > http://cmarcelo.blogspot.com/ and http://judy.sourceforge.net/ and > http://www.mail-archive.com/haskell@.../msg18766.html ) try. It uses a simple multiplicative hash function (a la Knuth) which seems to be dramatically better in practice. It also uses a rather simpler hash table implementation which seems to perform slightly better in practice (if this isn't true for your application I'm keen to know). As Udo Stenzel points out, we still need to examine the entire string in order to hash, and some problems may do better with something like a StringMap---I understand many information retrieval applications use Trie-like data structures for exactly this reason. -Jan-Willem Maessen > _______________________________________________ > Haskell-Cafe mailing list > Haskell-Cafe@... > http://www.haskell.org/mailman/listinfo/haskell-cafe _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@... http://www.haskell.org/mailman/listinfo/haskell-cafe |
| Free embeddable forum powered by Nabble | Forum Help |