« Return to Thread: [jira] Created: (JRUBY-3784) More than 2G memory required for jruby -e 'buf = IO.read("/tmp/1GB.txt"); p buf.size'

[jira] Created: (JRUBY-3784) More than 2G memory required for jruby -e 'buf = IO.read("/tmp/1GB.txt"); p buf.size'

by JIRA jira@codehaus.org :: Rate this Message:

Reply to Author | View in Thread

More than 2G memory required for jruby -e 'buf = IO.read("/tmp/1GB.txt"); p buf.size'
-------------------------------------------------------------------------------------

                 Key: JRUBY-3784
                 URL: http://jira.codehaus.org/browse/JRUBY-3784
             Project: JRuby
          Issue Type: Bug
    Affects Versions: JRuby 1.3.1
            Reporter: Wayne Meissner
            Assignee: Thomas E Enebo
             Fix For: JRuby 1.4


Leaving aside the wisdom or otherwise of trying to read 1G of data in one go, JRuby fails to load a 1G file even when the jvm memory is set to 1G.

e.g.
./bin/jruby -J-Xmx2048m -e 'buf = IO.read("/tmp/1GB.txt"); p buf.size'
Error: Your application used more memory than the safety cap of 2048m.
Specify -J-Xmx####m to increase it (#### = cap size in MB).
Specify -w for full OutOfMemoryError stack trace

Part of this is due to the way the jvm does file I/O.  When doing a 1G read into a heap buffer, the jvm will allocate a 1G direct ByteBuffer, do the read into the direct buffer, and then copy from the direct buffer to the heap buffer.

Ergo, for a 1G read, it will allocate 2G of memory (1G heap, 1G direct).

Splitting reads larger than 1M up into 1M sized chunks, seems to alleviate this situation.

{format}
diff --git a/src/org/jruby/util/io/ChannelStream.java b/src/org/jruby/util/io/ChannelStream.java
index 4582eca..44b8614 100644
--- a/src/org/jruby/util/io/ChannelStream.java
+++ b/src/org/jruby/util/io/ChannelStream.java
@@ -362,10 +362,22 @@ public class ChannelStream implements Stream, Finalizable {
             // Now read unbuffered directly from the file
             //
             while (buf.hasRemaining()) {
-                int n = channel.read(buf);
+                final int MAX_READ_CHUNK = 1 * 1024 * 1024;
+                //
+                // When reading into a heap buffer, the jvm allocates a temporary
+                // direct ByteBuffer of the requested size.  To avoid allocating
+                // a huge direct buffer when doing ludicrous reads (e.g. 1G or more)
+                // we split the read up into chunks of no more than 1M
+                //
+                ByteBuffer tmp = buf.duplicate();
+                if (tmp.remaining() > MAX_READ_CHUNK) {
+                    tmp.limit(tmp.position() + MAX_READ_CHUNK);
+                }
+                int n = channel.read(tmp);
                 if (n <= 0) {
                     break;
                 }
+                buf.position(tmp.position());
             }
             eof = true;
             result.length(buf.position());
{format}

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://jira.codehaus.org/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email


 « Return to Thread: [jira] Created: (JRUBY-3784) More than 2G memory required for jruby -e 'buf = IO.read("/tmp/1GB.txt"); p buf.size'