Should we replace MemoryStream?

View: New views
20 Messages — Rating Filter:   Alert me  
< Prev | 1 - 2 | Next >

Should we replace MemoryStream?

by Miguel de Icaza-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hello folks,

    I just blogged about a memory fragmentation issue here:

        http://tirania.org/blog/archive/2009/Nov-09.html

    And I am wondering: since MemoryStream is one of these sources of
problems, we could replace this implementation with MindTouch's
ChunkedStream.

Miguel.

_______________________________________________
Mono-devel-list mailing list
Mono-devel-list@...
http://lists.ximian.com/mailman/listinfo/mono-devel-list

Re: Should we replace MemoryStream?

by Avery Pennarun-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Mon, Nov 9, 2009 at 1:10 PM, Miguel de Icaza <miguel@...> wrote:
>    I just blogged about a memory fragmentation issue here:
>
>        http://tirania.org/blog/archive/2009/Nov-09.html
>
>    And I am wondering: since MemoryStream is one of these sources of
> problems, we could replace this implementation with MindTouch's
> ChunkedStream.

Probably stupid question: why is a compacting garbage collector
actually needed?  C programs have survived for a *long* time without
any ability whatsoever to compact memory, simply by carefully
optimizing their allocation algorithms to avoid fragmentation.  Is the
mono allocator very non-optimal in this respect?

Like I said, I feel like this is a stupid question.  But I'm curious
about the answer, and neither your blog post nor the linked page on
the sgen collector addresses it.

Thanks,

Avery
_______________________________________________
Mono-devel-list mailing list
Mono-devel-list@...
http://lists.ximian.com/mailman/listinfo/mono-devel-list

Re: Should we replace MemoryStream?

by Jeffrey Stedfast :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Miguel de Icaza wrote:

> Hello folks,
>
>     I just blogged about a memory fragmentation issue here:
>
> http://tirania.org/blog/archive/2009/Nov-09.html
>
>     And I am wondering: since MemoryStream is one of these sources of
> problems, we could replace this implementation with MindTouch's
> ChunkedStream.
>  

I'm really liking the idea of a chunked stream like this. Even once sgen
is complete and deployed, a chunked stream will still be more efficient
- not just in avoiding fragmentation, but also because reallocating a
new, larger buffer, has overhead associated with copying the content
whereas allocating a new chunk is a cheaper operation.

Jeff


_______________________________________________
Mono-devel-list mailing list
Mono-devel-list@...
http://lists.ximian.com/mailman/listinfo/mono-devel-list

Re: Should we replace MemoryStream?

by Alan McGovern-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hey,

On Mon, Nov 9, 2009 at 6:29 PM, Avery Pennarun <apenwarr@...> wrote:
On Mon, Nov 9, 2009 at 1:10 PM, Miguel de Icaza <miguel@...> wrote:
>    I just blogged about a memory fragmentation issue here:
>
>        http://tirania.org/blog/archive/2009/Nov-09.html
>
>    And I am wondering: since MemoryStream is one of these sources of
> problems, we could replace this implementation with MindTouch's
> ChunkedStream.

Probably stupid question: why is a compacting garbage collector
actually needed?  C programs have survived for a *long* time without
any ability whatsoever to compact memory, simply by carefully
optimizing their allocation algorithms to avoid fragmentation.  Is the
mono allocator very non-optimal in this respect?

One of the causes is that in a garbage collected language you allocate when you need something and discard it when you're done. If you do this with large buffers which are pinned in memory and have just the wrong allocation pattern, you can bloat your memory usage. There's nothing mono can do in this case as essentially it's the user causing the bloat.

One thing you can do is keep a cache of buffers yourself and re-use them. For example if your application allocates 10 chunked memory streams a second and Dispose () them when you're done, you could add/remove the 'chunks' from a cache. This way you'd only ever allocate 10xsizeof (chunked stream) bytes of memory and you'd constantly re-use them.

Alan.



Like I said, I feel like this is a stupid question.  But I'm curious
about the answer, and neither your blog post nor the linked page on
the sgen collector addresses it.

Thanks,

Avery
_______________________________________________
Mono-devel-list mailing list
Mono-devel-list@...
http://lists.ximian.com/mailman/listinfo/mono-devel-list


_______________________________________________
Mono-devel-list mailing list
Mono-devel-list@...
http://lists.ximian.com/mailman/listinfo/mono-devel-list

Re: Should we replace MemoryStream?

by Andreas Nahr :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Are you talking about System.IO.MemoryStream?
Then imho this would be a problematic move.
Most people are used to new MemoryStream (someByteArray) being O(0) time,
but with ChunkedStream would be O(n). In fact in those cases ChunkedStream
would need twice the memory because it would still need to retain the
original byte array (e.g. for GetBuffer).

Happy hacking
Andreas

-----Ursprüngliche Nachricht-----
Von: mono-devel-list-bounces@...
[mailto:mono-devel-list-bounces@...] Im Auftrag von Miguel de
Icaza
Gesendet: Montag, 9. November 2009 19:10
An: mono-devel-list
Betreff: [Mono-dev] Should we replace MemoryStream?

Hello folks,

    I just blogged about a memory fragmentation issue here:

        http://tirania.org/blog/archive/2009/Nov-09.html

    And I am wondering: since MemoryStream is one of these sources of
problems, we could replace this implementation with MindTouch's
ChunkedStream.

Miguel.

_______________________________________________
Mono-devel-list mailing list
Mono-devel-list@...
http://lists.ximian.com/mailman/listinfo/mono-devel-list

_______________________________________________
Mono-devel-list mailing list
Mono-devel-list@...
http://lists.ximian.com/mailman/listinfo/mono-devel-list

Re: Should we replace MemoryStream?

by Miguel de Icaza-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hello,

> Probably stupid question: why is a compacting garbage collector
> actually needed?  C programs have survived for a *long* time without
> any ability whatsoever to compact memory, simply by carefully
> optimizing their allocation algorithms to avoid fragmentation.  Is the
> mono allocator very non-optimal in this respect?

Long running applications tend to have this problem.

Either people ignore the problem, or they come up with some creative
solutions, and the solutions are all over the map.

In some cases, for long running processes, people split work into
separate processes and recycle the processes (for example Apache) in
other cases, they use custom memory allocation, pre-allocated pools, or
mark/release pools.



_______________________________________________
Mono-devel-list mailing list
Mono-devel-list@...
http://lists.ximian.com/mailman/listinfo/mono-devel-list

Re: Should we replace MemoryStream?

by Avery Pennarun-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Mon, Nov 9, 2009 at 2:48 PM, Miguel de Icaza <miguel@...> wrote:

> Avery wrote:
>> Probably stupid question: why is a compacting garbage collector
>> actually needed?  C programs have survived for a *long* time without
>> any ability whatsoever to compact memory, simply by carefully
>> optimizing their allocation algorithms to avoid fragmentation.  Is the
>> mono allocator very non-optimal in this respect?
>
> Long running applications tend to have this problem.
>
> Either people ignore the problem, or they come up with some creative
> solutions, and the solutions are all over the map.
>
> In some cases, for long running processes, people split work into
> separate processes and recycle the processes (for example Apache) in
> other cases, they use custom memory allocation, pre-allocated pools, or
> mark/release pools.

Thanks for the response.  That makes sense to me.  Short answer:
everybody has this problem, but it's only solvable in the general case
when you have GC and compaction.

Thanks :)

Avery
_______________________________________________
Mono-devel-list mailing list
Mono-devel-list@...
http://lists.ximian.com/mailman/listinfo/mono-devel-list

Re: Should we replace MemoryStream?

by Bjorg :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

The implementation could be adapted so that if the chunked memory  
stream is initialized with an existing byte array it behaves like it  
did in the past.  It's possible that the best approached can be  
derived for the various MemoryStream constructors.

The question is what is the most common usage pattern?  If it's  
GetBuffer(), then there will be a performance and overhead hit.  
However, if it's using Write()/Read() as we do, then there are some  
significant gains to be had.  GetBytes() would also benefit, though  
not as much.

- Steve

--------------
Steve G. Bjorg
http://mindtouch.com
http://twitter.com/bjorg
irc.freenode.net #mindtouch

On Nov 9, 2009, at 11:41 AM, Andreas Nahr wrote:

> Are you talking about System.IO.MemoryStream?
> Then imho this would be a problematic move.
> Most people are used to new MemoryStream (someByteArray) being O(0)  
> time,
> but with ChunkedStream would be O(n). In fact in those cases  
> ChunkedStream
> would need twice the memory because it would still need to retain the
> original byte array (e.g. for GetBuffer).
>
> Happy hacking
> Andreas
>
> -----Ursprüngliche Nachricht-----
> Von: mono-devel-list-bounces@...
> [mailto:mono-devel-list-bounces@...] Im Auftrag von  
> Miguel de
> Icaza
> Gesendet: Montag, 9. November 2009 19:10
> An: mono-devel-list
> Betreff: [Mono-dev] Should we replace MemoryStream?
>
> Hello folks,
>
>    I just blogged about a memory fragmentation issue here:
>
> http://tirania.org/blog/archive/2009/Nov-09.html
>
>    And I am wondering: since MemoryStream is one of these sources of
> problems, we could replace this implementation with MindTouch's
> ChunkedStream.
>
> Miguel.
>
> _______________________________________________
> Mono-devel-list mailing list
> Mono-devel-list@...
> http://lists.ximian.com/mailman/listinfo/mono-devel-list
>
> _______________________________________________
> Mono-devel-list mailing list
> Mono-devel-list@...
> http://lists.ximian.com/mailman/listinfo/mono-devel-list

_______________________________________________
Mono-devel-list mailing list
Mono-devel-list@...
http://lists.ximian.com/mailman/listinfo/mono-devel-list

Re: Should we replace MemoryStream?

by Miguel de Icaza-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hello,

> Are you talking about System.IO.MemoryStream?
> Then imho this would be a problematic move.
> Most people are used to new MemoryStream (someByteArray) being O(0) time,
> but with ChunkedStream would be O(n). In fact in those cases ChunkedStream
> would need twice the memory because it would still need to retain the
> original byte array (e.g. for GetBuffer).

I dont think it would be O(n), there is a dereference only.

>
> Happy hacking
> Andreas
>
> -----Ursprüngliche Nachricht-----
> Von: mono-devel-list-bounces@...
> [mailto:mono-devel-list-bounces@...] Im Auftrag von Miguel de
> Icaza
> Gesendet: Montag, 9. November 2009 19:10
> An: mono-devel-list
> Betreff: [Mono-dev] Should we replace MemoryStream?
>
> Hello folks,
>
>     I just blogged about a memory fragmentation issue here:
>
> http://tirania.org/blog/archive/2009/Nov-09.html
>
>     And I am wondering: since MemoryStream is one of these sources of
> problems, we could replace this implementation with MindTouch's
> ChunkedStream.
>
> Miguel.
>
> _______________________________________________
> Mono-devel-list mailing list
> Mono-devel-list@...
> http://lists.ximian.com/mailman/listinfo/mono-devel-list
>

_______________________________________________
Mono-devel-list mailing list
Mono-devel-list@...
http://lists.ximian.com/mailman/listinfo/mono-devel-list

Re: Should we replace MemoryStream?

by PABLOSANTOSLUAC@terra.es :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi,

I think a very important place were this chunked mem streams HAVE TO be
placed (in fact I'll be trying them tomorrow under heavy load on our
testing cluster) is in remoting: there's a huge number of mem streams
being created and destroyed (one for each call) and this can help...
only if the new ChunkedStream is based on a pool, which I don't think is
the case but probably could be easily doable.

What do you think about extending ChunkedMemoryStream to include an
option based on a chunk pool, so memory is actually reused instead of
freed and re-allocated? Wouldn't it be good to reduce fragmentation (I'm
still thinking on remoting)??


pablo


www.plasticscm.com



Steve Bjorg wrote:

> The implementation could be adapted so that if the chunked memory  
> stream is initialized with an existing byte array it behaves like it  
> did in the past.  It's possible that the best approached can be  
> derived for the various MemoryStream constructors.
>
> The question is what is the most common usage pattern?  If it's  
> GetBuffer(), then there will be a performance and overhead hit.  
> However, if it's using Write()/Read() as we do, then there are some  
> significant gains to be had.  GetBytes() would also benefit, though  
> not as much.
>
> - Steve
>
> --------------
> Steve G. Bjorg
> http://mindtouch.com
> http://twitter.com/bjorg
> irc.freenode.net #mindtouch
>
> On Nov 9, 2009, at 11:41 AM, Andreas Nahr wrote:
>
>> Are you talking about System.IO.MemoryStream?
>> Then imho this would be a problematic move.
>> Most people are used to new MemoryStream (someByteArray) being O(0)  
>> time,
>> but with ChunkedStream would be O(n). In fact in those cases  
>> ChunkedStream
>> would need twice the memory because it would still need to retain the
>> original byte array (e.g. for GetBuffer).
>>
>> Happy hacking
>> Andreas
>>
>> -----Ursprüngliche Nachricht-----
>> Von: mono-devel-list-bounces@...
>> [mailto:mono-devel-list-bounces@...] Im Auftrag von  
>> Miguel de
>> Icaza
>> Gesendet: Montag, 9. November 2009 19:10
>> An: mono-devel-list
>> Betreff: [Mono-dev] Should we replace MemoryStream?
>>
>> Hello folks,
>>
>>    I just blogged about a memory fragmentation issue here:
>>
>> http://tirania.org/blog/archive/2009/Nov-09.html
>>
>>    And I am wondering: since MemoryStream is one of these sources of
>> problems, we could replace this implementation with MindTouch's
>> ChunkedStream.
>>
>> Miguel.
>>
>> _______________________________________________
>> Mono-devel-list mailing list
>> Mono-devel-list@...
>> http://lists.ximian.com/mailman/listinfo/mono-devel-list
>>
>> _______________________________________________
>> Mono-devel-list mailing list
>> Mono-devel-list@...
>> http://lists.ximian.com/mailman/listinfo/mono-devel-list
>
> _______________________________________________
> Mono-devel-list mailing list
> Mono-devel-list@...
> http://lists.ximian.com/mailman/listinfo/mono-devel-list
>

_______________________________________________
Mono-devel-list mailing list
Mono-devel-list@...
http://lists.ximian.com/mailman/listinfo/mono-devel-list

Re: Should we replace MemoryStream?

by Miguel de Icaza-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hello,

> What do you think about extending ChunkedMemoryStream to include an
> option based on a chunk pool, so memory is actually reused instead of
> freed and re-allocated? Wouldn't it be good to reduce fragmentation (I'm
> still thinking on remoting)??

This would work;

Additionally, it might make sense to use the suggestion from Steve to
use plain memory allocation up to a certain point (below some threshold)
and then switch to chunk after that to avoid allocating 16k even for
memory streams that might only have 100 bytes for example.

>
>
> pablo
>
>
> www.plasticscm.com
>
>
>
> Steve Bjorg wrote:
> > The implementation could be adapted so that if the chunked memory  
> > stream is initialized with an existing byte array it behaves like it  
> > did in the past.  It's possible that the best approached can be  
> > derived for the various MemoryStream constructors.
> >
> > The question is what is the most common usage pattern?  If it's  
> > GetBuffer(), then there will be a performance and overhead hit.  
> > However, if it's using Write()/Read() as we do, then there are some  
> > significant gains to be had.  GetBytes() would also benefit, though  
> > not as much.
> >
> > - Steve
> >
> > --------------
> > Steve G. Bjorg
> > http://mindtouch.com
> > http://twitter.com/bjorg
> > irc.freenode.net #mindtouch
> >
> > On Nov 9, 2009, at 11:41 AM, Andreas Nahr wrote:
> >
> >> Are you talking about System.IO.MemoryStream?
> >> Then imho this would be a problematic move.
> >> Most people are used to new MemoryStream (someByteArray) being O(0)  
> >> time,
> >> but with ChunkedStream would be O(n). In fact in those cases  
> >> ChunkedStream
> >> would need twice the memory because it would still need to retain the
> >> original byte array (e.g. for GetBuffer).
> >>
> >> Happy hacking
> >> Andreas
> >>
> >> -----Ursprüngliche Nachricht-----
> >> Von: mono-devel-list-bounces@...
> >> [mailto:mono-devel-list-bounces@...] Im Auftrag von  
> >> Miguel de
> >> Icaza
> >> Gesendet: Montag, 9. November 2009 19:10
> >> An: mono-devel-list
> >> Betreff: [Mono-dev] Should we replace MemoryStream?
> >>
> >> Hello folks,
> >>
> >>    I just blogged about a memory fragmentation issue here:
> >>
> >> http://tirania.org/blog/archive/2009/Nov-09.html
> >>
> >>    And I am wondering: since MemoryStream is one of these sources of
> >> problems, we could replace this implementation with MindTouch's
> >> ChunkedStream.
> >>
> >> Miguel.
> >>
> >> _______________________________________________
> >> Mono-devel-list mailing list
> >> Mono-devel-list@...
> >> http://lists.ximian.com/mailman/listinfo/mono-devel-list
> >>
> >> _______________________________________________
> >> Mono-devel-list mailing list
> >> Mono-devel-list@...
> >> http://lists.ximian.com/mailman/listinfo/mono-devel-list
> >
> > _______________________________________________
> > Mono-devel-list mailing list
> > Mono-devel-list@...
> > http://lists.ximian.com/mailman/listinfo/mono-devel-list
> >
>
> _______________________________________________
> Mono-devel-list mailing list
> Mono-devel-list@...
> http://lists.ximian.com/mailman/listinfo/mono-devel-list

_______________________________________________
Mono-devel-list mailing list
Mono-devel-list@...
http://lists.ximian.com/mailman/listinfo/mono-devel-list

Re: Should we replace MemoryStream?

by Bjorg :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I like this idea a lot.  The only gotcha is that the chunks would need  
to be zero'ed out before they go back to the pool to avoid leakage of  
sensitive information between streams.

Since all the chunks are the same size, a simple lock-free stack can  
be used as pool of chunks.  This would ensure pretty fast access to  
them.

- Steve

--------------
Steve G. Bjorg
http://mindtouch.com
http://twitter.com/bjorg
irc.freenode.net #mindtouch

On Nov 9, 2009, at 2:53 PM, pablosantosluac@... wrote:

> Hi,
>
> I think a very important place were this chunked mem streams HAVE TO  
> be
> placed (in fact I'll be trying them tomorrow under heavy load on our
> testing cluster) is in remoting: there's a huge number of mem streams
> being created and destroyed (one for each call) and this can help...
> only if the new ChunkedStream is based on a pool, which I don't  
> think is
> the case but probably could be easily doable.
>
> What do you think about extending ChunkedMemoryStream to include an
> option based on a chunk pool, so memory is actually reused instead of
> freed and re-allocated? Wouldn't it be good to reduce fragmentation  
> (I'm
> still thinking on remoting)??
>
>
> pablo
>
>
> www.plasticscm.com
>
>
>
> Steve Bjorg wrote:
>> The implementation could be adapted so that if the chunked memory
>> stream is initialized with an existing byte array it behaves like it
>> did in the past.  It's possible that the best approached can be
>> derived for the various MemoryStream constructors.
>>
>> The question is what is the most common usage pattern?  If it's
>> GetBuffer(), then there will be a performance and overhead hit.
>> However, if it's using Write()/Read() as we do, then there are some
>> significant gains to be had.  GetBytes() would also benefit, though
>> not as much.
>>
>> - Steve
>>
>> --------------
>> Steve G. Bjorg
>> http://mindtouch.com
>> http://twitter.com/bjorg
>> irc.freenode.net #mindtouch
>>
>> On Nov 9, 2009, at 11:41 AM, Andreas Nahr wrote:
>>
>>> Are you talking about System.IO.MemoryStream?
>>> Then imho this would be a problematic move.
>>> Most people are used to new MemoryStream (someByteArray) being O(0)
>>> time,
>>> but with ChunkedStream would be O(n). In fact in those cases
>>> ChunkedStream
>>> would need twice the memory because it would still need to retain  
>>> the
>>> original byte array (e.g. for GetBuffer).
>>>
>>> Happy hacking
>>> Andreas
>>>
>>> -----Ursprüngliche Nachricht-----
>>> Von: mono-devel-list-bounces@...
>>> [mailto:mono-devel-list-bounces@...] Im Auftrag von
>>> Miguel de
>>> Icaza
>>> Gesendet: Montag, 9. November 2009 19:10
>>> An: mono-devel-list
>>> Betreff: [Mono-dev] Should we replace MemoryStream?
>>>
>>> Hello folks,
>>>
>>>   I just blogged about a memory fragmentation issue here:
>>>
>>> http://tirania.org/blog/archive/2009/Nov-09.html
>>>
>>>   And I am wondering: since MemoryStream is one of these sources of
>>> problems, we could replace this implementation with MindTouch's
>>> ChunkedStream.
>>>
>>> Miguel.
>>>
>>> _______________________________________________
>>> Mono-devel-list mailing list
>>> Mono-devel-list@...
>>> http://lists.ximian.com/mailman/listinfo/mono-devel-list
>>>
>>> _______________________________________________
>>> Mono-devel-list mailing list
>>> Mono-devel-list@...
>>> http://lists.ximian.com/mailman/listinfo/mono-devel-list
>>
>> _______________________________________________
>> Mono-devel-list mailing list
>> Mono-devel-list@...
>> http://lists.ximian.com/mailman/listinfo/mono-devel-list
>>
>

_______________________________________________
Mono-devel-list mailing list
Mono-devel-list@...
http://lists.ximian.com/mailman/listinfo/mono-devel-list

Re: Should we replace MemoryStream?

by Andreas Nahr :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I'm still not sure this is a good idea. A lot of this depends on the
use-case for MemoryStream.
If
1) A MemoryStream is created with a parameterless constructor and then a lot
of data written to it multiple times the ChunkedStream will be better
always.
2) If a MemoryStream is created with a parameterless constructor and only
gets a few bytes long ChunkedStream might bring considerable overhead.
3) If MemoryStream is created with a fixed size then ChunkedStream will be
somewhat, but acceptably slower and have a higher overhead. But it will be
totally abysmal once GetBuffer comes into play.
4) If MemoryStream is constructed from a (large) byte array (in the
scientific field I'm coming from this is by far the most common usage I've
seem; that is basically using MemoryStream as a (read-only) Stream-Wrapper
around a byte array) then performance will be abysmal when constructing (if
you chunkify e.g. a 500MB byte array) AND again with GetBuffer (recreate the
array). So would be O (n) or even O (2*n) instead of O (0).

It might be possible to create an implementation that can deal with all this
(would need to have variable sized buffers, keep things it gets passed in
the constructor alive with small overhead, etc.), but it will be quite
complex and come with a large base overhead. And even then the GetBuffer
O(n) problem remains in a few scenarios.

Maybe it would be better to just leave the class as is and document that for
certain scenarios alternative implementations are available that do a MUCH
better job. Everybody can easily replace the use of MemoryStream with an
alternative implementation if needed. But nobody expects this class to
behave completely different from how it originally did (and seems to do in
MS.Net).

Andreas

_______________________________________________
Mono-devel-list mailing list
Mono-devel-list@...
http://lists.ximian.com/mailman/listinfo/mono-devel-list

Re: Should we replace MemoryStream?

by PABLOSANTOSLUAC@terra.es :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I agree (especially thinking about the chunk-pool I mentioned) having
separate classes can be better, so that everyone can choose.

Andreas Nahr wrote:

> I'm still not sure this is a good idea. A lot of this depends on the
> use-case for MemoryStream.
> If
> 1) A MemoryStream is created with a parameterless constructor and then a lot
> of data written to it multiple times the ChunkedStream will be better
> always.
> 2) If a MemoryStream is created with a parameterless constructor and only
> gets a few bytes long ChunkedStream might bring considerable overhead.
> 3) If MemoryStream is created with a fixed size then ChunkedStream will be
> somewhat, but acceptably slower and have a higher overhead. But it will be
> totally abysmal once GetBuffer comes into play.
> 4) If MemoryStream is constructed from a (large) byte array (in the
> scientific field I'm coming from this is by far the most common usage I've
> seem; that is basically using MemoryStream as a (read-only) Stream-Wrapper
> around a byte array) then performance will be abysmal when constructing (if
> you chunkify e.g. a 500MB byte array) AND again with GetBuffer (recreate the
> array). So would be O (n) or even O (2*n) instead of O (0).
>
> It might be possible to create an implementation that can deal with all this
> (would need to have variable sized buffers, keep things it gets passed in
> the constructor alive with small overhead, etc.), but it will be quite
> complex and come with a large base overhead. And even then the GetBuffer
> O(n) problem remains in a few scenarios.
>
> Maybe it would be better to just leave the class as is and document that for
> certain scenarios alternative implementations are available that do a MUCH
> better job. Everybody can easily replace the use of MemoryStream with an
> alternative implementation if needed. But nobody expects this class to
> behave completely different from how it originally did (and seems to do in
> MS.Net).
>
> Andreas
>
>
_______________________________________________
Mono-devel-list mailing list
Mono-devel-list@...
http://lists.ximian.com/mailman/listinfo/mono-devel-list

Re: Should we replace MemoryStream?

by Leszek Ciesielski :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Choice is not always good, and I think this is one of the cases when
the default (i.e. the MemoryStream implementation) should make the
choices instead presenting them to the user. Though I agree that the
case of constructing a MemoryStream from an existing byte[] would
require a special path in the code, as this is a stream that most
likely won't be resized and in this case users are expecting the
constructor to have a complexity of O(1) and GetBuffer to also be
O(1). The same expectation is probably also true with a fixed size
MemoryStream.

On Tue, Nov 10, 2009 at 1:09 PM, pablosantosluac@...
<pablosantosluac@...> wrote:

> I agree (especially thinking about the chunk-pool I mentioned) having
> separate classes can be better, so that everyone can choose.
>
> Andreas Nahr wrote:
>> I'm still not sure this is a good idea. A lot of this depends on the
>> use-case for MemoryStream.
>> If
>> 1) A MemoryStream is created with a parameterless constructor and then a lot
>> of data written to it multiple times the ChunkedStream will be better
>> always.
>> 2) If a MemoryStream is created with a parameterless constructor and only
>> gets a few bytes long ChunkedStream might bring considerable overhead.
>> 3) If MemoryStream is created with a fixed size then ChunkedStream will be
>> somewhat, but acceptably slower and have a higher overhead. But it will be
>> totally abysmal once GetBuffer comes into play.
>> 4) If MemoryStream is constructed from a (large) byte array (in the
>> scientific field I'm coming from this is by far the most common usage I've
>> seem; that is basically using MemoryStream as a (read-only) Stream-Wrapper
>> around a byte array) then performance will be abysmal when constructing (if
>> you chunkify e.g. a 500MB byte array) AND again with GetBuffer (recreate the
>> array). So would be O (n) or even O (2*n) instead of O (0).
>>
>> It might be possible to create an implementation that can deal with all this
>> (would need to have variable sized buffers, keep things it gets passed in
>> the constructor alive with small overhead, etc.), but it will be quite
>> complex and come with a large base overhead. And even then the GetBuffer
>> O(n) problem remains in a few scenarios.
>>
>> Maybe it would be better to just leave the class as is and document that for
>> certain scenarios alternative implementations are available that do a MUCH
>> better job. Everybody can easily replace the use of MemoryStream with an
>> alternative implementation if needed. But nobody expects this class to
>> behave completely different from how it originally did (and seems to do in
>> MS.Net).
>>
>> Andreas
>>
>>
> _______________________________________________
> Mono-devel-list mailing list
> Mono-devel-list@...
> http://lists.ximian.com/mailman/listinfo/mono-devel-list
>
_______________________________________________
Mono-devel-list mailing list
Mono-devel-list@...
http://lists.ximian.com/mailman/listinfo/mono-devel-list

Re: Should we replace MemoryStream?

by Bjorg :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Allowing the first chunk to be variable sized doesn't make the code  
that much more complex.  This would mean in read-only cases, all  
operations would remain O(1) since the original byte array would be  
preserved.  For write operations, new chunks would be allocated as  
needed.  Determining which chunk to read from or write to would need  
to take into account the first chunk size, but that's it.

For the case where someone initializes the ChunkedMemoryStream with an  
existing byte array, then appends to it, and then calls GetBuffer(),  
we would end up with the same overhead as before since the  
MemoryStream would have needed to reallocate the byte array when the  
first append operation occurred, whereas the ChunkedMemoryStream does  
it on GetBuffer().  However, if the array needed to be extended  
multiple times due to many append operations, then the  
ChunkedMemoryStream will come out ahead again  since it only  
realloacted the buffer once.  At which point, the realloacted buffer  
could replace the first chunk so we don't do this again for repeated  
calls to GetBuffer().


On Nov 10, 2009, at 4:21 AM, Leszek Ciesielski wrote:

> Choice is not always good, and I think this is one of the cases when
> the default (i.e. the MemoryStream implementation) should make the
> choices instead presenting them to the user. Though I agree that the
> case of constructing a MemoryStream from an existing byte[] would
> require a special path in the code, as this is a stream that most
> likely won't be resized and in this case users are expecting the
> constructor to have a complexity of O(1) and GetBuffer to also be
> O(1). The same expectation is probably also true with a fixed size
> MemoryStream.
>
> On Tue, Nov 10, 2009 at 1:09 PM, pablosantosluac@...
> <pablosantosluac@...> wrote:
>> I agree (especially thinking about the chunk-pool I mentioned) having
>> separate classes can be better, so that everyone can choose.
>>
>> Andreas Nahr wrote:
>>> I'm still not sure this is a good idea. A lot of this depends on the
>>> use-case for MemoryStream.
>>> If
>>> 1) A MemoryStream is created with a parameterless constructor and  
>>> then a lot
>>> of data written to it multiple times the ChunkedStream will be  
>>> better
>>> always.
>>> 2) If a MemoryStream is created with a parameterless constructor  
>>> and only
>>> gets a few bytes long ChunkedStream might bring considerable  
>>> overhead.
>>> 3) If MemoryStream is created with a fixed size then ChunkedStream  
>>> will be
>>> somewhat, but acceptably slower and have a higher overhead. But it  
>>> will be
>>> totally abysmal once GetBuffer comes into play.
>>> 4) If MemoryStream is constructed from a (large) byte array (in the
>>> scientific field I'm coming from this is by far the most common  
>>> usage I've
>>> seem; that is basically using MemoryStream as a (read-only) Stream-
>>> Wrapper
>>> around a byte array) then performance will be abysmal when  
>>> constructing (if
>>> you chunkify e.g. a 500MB byte array) AND again with GetBuffer  
>>> (recreate the
>>> array). So would be O (n) or even O (2*n) instead of O (0).
>>>
>>> It might be possible to create an implementation that can deal  
>>> with all this
>>> (would need to have variable sized buffers, keep things it gets  
>>> passed in
>>> the constructor alive with small overhead, etc.), but it will be  
>>> quite
>>> complex and come with a large base overhead. And even then the  
>>> GetBuffer
>>> O(n) problem remains in a few scenarios.
>>>
>>> Maybe it would be better to just leave the class as is and  
>>> document that for
>>> certain scenarios alternative implementations are available that  
>>> do a MUCH
>>> better job. Everybody can easily replace the use of MemoryStream  
>>> with an
>>> alternative implementation if needed. But nobody expects this  
>>> class to
>>> behave completely different from how it originally did (and seems  
>>> to do in
>>> MS.Net).
>>>
>>> Andreas
>>>
>>>
>> _______________________________________________
>> Mono-devel-list mailing list
>> Mono-devel-list@...
>> http://lists.ximian.com/mailman/listinfo/mono-devel-list
>>
> _______________________________________________
> Mono-devel-list mailing list
> Mono-devel-list@...
> http://lists.ximian.com/mailman/listinfo/mono-devel-list


- Steve

--------------
Steve G. Bjorg
http://mindtouch.com
http://twitter.com/bjorg
irc.freenode.net #mindtouch


_______________________________________________
Mono-devel-list mailing list
Mono-devel-list@...
http://lists.ximian.com/mailman/listinfo/mono-devel-list

Re: Should we replace MemoryStream?

by Bjorg :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

After taking a closer look at the constructors for MemoryStream, I  
need to amend my earlier response.  In all cases where a byte array is  
passed into the constructor, the MemoryStream has fixed capacity.  So  
the case described below where a MemoryStream would start off with a  
byte array and then be appended to cannot happen.

In short, MemoryStream can operate in two modes: fixed and variable.  
For fixed, there is nothing to be gained by using chunks.  The  
existing implementation is optimal for all cases.  This is the  
situation that Andreas is referring to.

For variable, a chunked implementation will work better once the  
stream exceeds the maximum size for the first chunk.  Additional  
considerations for this case are: (a) should the first chunk have a  
smaller size initially to be more efficient for short streams, (b)  
should chunks be reusable and thus bypass the alloc/free cycle, (c)  
should a call to GetBuffer() automatically reset the first chunk with  
the newly created byte array?

Am I missing anything?


On Nov 10, 2009, at 4:47 AM, Steve Bjorg wrote:

> Allowing the first chunk to be variable sized doesn't make the code  
> that much more complex.  This would mean in read-only cases, all  
> operations would remain O(1) since the original byte array would be  
> preserved.  For write operations, new chunks would be allocated as  
> needed.  Determining which chunk to read from or write to would need  
> to take into account the first chunk size, but that's it.
>
> For the case where someone initializes the ChunkedMemoryStream with  
> an existing byte array, then appends to it, and then calls GetBuffer
> (), we would end up with the same overhead as before since the  
> MemoryStream would have needed to reallocate the byte array when the  
> first append operation occurred, whereas the ChunkedMemoryStream  
> does it on GetBuffer().  However, if the array needed to be extended  
> multiple times due to many append operations, then the  
> ChunkedMemoryStream will come out ahead again  since it only  
> realloacted the buffer once.  At which point, the realloacted buffer  
> could replace the first chunk so we don't do this again for repeated  
> calls to GetBuffer().
>
>
> On Nov 10, 2009, at 4:21 AM, Leszek Ciesielski wrote:
>
>> Choice is not always good, and I think this is one of the cases when
>> the default (i.e. the MemoryStream implementation) should make the
>> choices instead presenting them to the user. Though I agree that the
>> case of constructing a MemoryStream from an existing byte[] would
>> require a special path in the code, as this is a stream that most
>> likely won't be resized and in this case users are expecting the
>> constructor to have a complexity of O(1) and GetBuffer to also be
>> O(1). The same expectation is probably also true with a fixed size
>> MemoryStream.
>>
>> On Tue, Nov 10, 2009 at 1:09 PM, pablosantosluac@...
>> <pablosantosluac@...> wrote:
>>> I agree (especially thinking about the chunk-pool I mentioned)  
>>> having
>>> separate classes can be better, so that everyone can choose.
>>>
>>> Andreas Nahr wrote:
>>>> I'm still not sure this is a good idea. A lot of this depends on  
>>>> the
>>>> use-case for MemoryStream.
>>>> If
>>>> 1) A MemoryStream is created with a parameterless constructor and  
>>>> then a lot
>>>> of data written to it multiple times the ChunkedStream will be  
>>>> better
>>>> always.
>>>> 2) If a MemoryStream is created with a parameterless constructor  
>>>> and only
>>>> gets a few bytes long ChunkedStream might bring considerable  
>>>> overhead.
>>>> 3) If MemoryStream is created with a fixed size then  
>>>> ChunkedStream will be
>>>> somewhat, but acceptably slower and have a higher overhead. But  
>>>> it will be
>>>> totally abysmal once GetBuffer comes into play.
>>>> 4) If MemoryStream is constructed from a (large) byte array (in the
>>>> scientific field I'm coming from this is by far the most common  
>>>> usage I've
>>>> seem; that is basically using MemoryStream as a (read-only)  
>>>> Stream-Wrapper
>>>> around a byte array) then performance will be abysmal when  
>>>> constructing (if
>>>> you chunkify e.g. a 500MB byte array) AND again with GetBuffer  
>>>> (recreate the
>>>> array). So would be O (n) or even O (2*n) instead of O (0).
>>>>
>>>> It might be possible to create an implementation that can deal  
>>>> with all this
>>>> (would need to have variable sized buffers, keep things it gets  
>>>> passed in
>>>> the constructor alive with small overhead, etc.), but it will be  
>>>> quite
>>>> complex and come with a large base overhead. And even then the  
>>>> GetBuffer
>>>> O(n) problem remains in a few scenarios.
>>>>
>>>> Maybe it would be better to just leave the class as is and  
>>>> document that for
>>>> certain scenarios alternative implementations are available that  
>>>> do a MUCH
>>>> better job. Everybody can easily replace the use of MemoryStream  
>>>> with an
>>>> alternative implementation if needed. But nobody expects this  
>>>> class to
>>>> behave completely different from how it originally did (and seems  
>>>> to do in
>>>> MS.Net).
>>>>
>>>> Andreas
>>>>
>>>>
>>> _______________________________________________
>>> Mono-devel-list mailing list
>>> Mono-devel-list@...
>>> http://lists.ximian.com/mailman/listinfo/mono-devel-list
>>>
>> _______________________________________________
>> Mono-devel-list mailing list
>> Mono-devel-list@...
>> http://lists.ximian.com/mailman/listinfo/mono-devel-list
>
>
> - Steve
>
> --------------
> Steve G. Bjorg
> http://mindtouch.com
> http://twitter.com/bjorg
> irc.freenode.net #mindtouch
>
>

_______________________________________________
Mono-devel-list mailing list
Mono-devel-list@...
http://lists.ximian.com/mailman/listinfo/mono-devel-list

Re: Should we replace MemoryStream?

by PABLOSANTOSLUAC@terra.es :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi,

> For variable, a chunked implementation will work better once the stream
> exceeds the maximum size for the first chunk.  Additional considerations
> for this case are: (a) should the first chunk have a smaller size
> initially to be more efficient for short streams, (b) should chunks be
> reusable and thus bypass the alloc/free cycle, (c) should a call to
> GetBuffer() automatically reset the first chunk with the newly created
> byte array?

I think (b) can be great, but obviously it can always be a
PooledCunkedMemoryStream class, or a different constructor.




> On Nov 10, 2009, at 4:47 AM, Steve Bjorg wrote:
>
>> Allowing the first chunk to be variable sized doesn't make the code
>> that much more complex.  This would mean in read-only cases, all
>> operations would remain O(1) since the original byte array would be
>> preserved.  For write operations, new chunks would be allocated as
>> needed.  Determining which chunk to read from or write to would need
>> to take into account the first chunk size, but that's it.
>>
>> For the case where someone initializes the ChunkedMemoryStream with an
>> existing byte array, then appends to it, and then calls GetBuffer(),
>> we would end up with the same overhead as before since the
>> MemoryStream would have needed to reallocate the byte array when the
>> first append operation occurred, whereas the ChunkedMemoryStream does
>> it on GetBuffer().  However, if the array needed to be extended
>> multiple times due to many append operations, then the
>> ChunkedMemoryStream will come out ahead again  since it only
>> realloacted the buffer once.  At which point, the realloacted buffer
>> could replace the first chunk so we don't do this again for repeated
>> calls to GetBuffer().
>>
>>
>> On Nov 10, 2009, at 4:21 AM, Leszek Ciesielski wrote:
>>
>>> Choice is not always good, and I think this is one of the cases when
>>> the default (i.e. the MemoryStream implementation) should make the
>>> choices instead presenting them to the user. Though I agree that the
>>> case of constructing a MemoryStream from an existing byte[] would
>>> require a special path in the code, as this is a stream that most
>>> likely won't be resized and in this case users are expecting the
>>> constructor to have a complexity of O(1) and GetBuffer to also be
>>> O(1). The same expectation is probably also true with a fixed size
>>> MemoryStream.
>>>
>>> On Tue, Nov 10, 2009 at 1:09 PM, pablosantosluac@...
>>> <pablosantosluac@...> wrote:
>>>> I agree (especially thinking about the chunk-pool I mentioned) having
>>>> separate classes can be better, so that everyone can choose.
>>>>
>>>> Andreas Nahr wrote:
>>>>> I'm still not sure this is a good idea. A lot of this depends on the
>>>>> use-case for MemoryStream.
>>>>> If
>>>>> 1) A MemoryStream is created with a parameterless constructor and
>>>>> then a lot
>>>>> of data written to it multiple times the ChunkedStream will be better
>>>>> always.
>>>>> 2) If a MemoryStream is created with a parameterless constructor
>>>>> and only
>>>>> gets a few bytes long ChunkedStream might bring considerable overhead.
>>>>> 3) If MemoryStream is created with a fixed size then ChunkedStream
>>>>> will be
>>>>> somewhat, but acceptably slower and have a higher overhead. But it
>>>>> will be
>>>>> totally abysmal once GetBuffer comes into play.
>>>>> 4) If MemoryStream is constructed from a (large) byte array (in the
>>>>> scientific field I'm coming from this is by far the most common
>>>>> usage I've
>>>>> seem; that is basically using MemoryStream as a (read-only)
>>>>> Stream-Wrapper
>>>>> around a byte array) then performance will be abysmal when
>>>>> constructing (if
>>>>> you chunkify e.g. a 500MB byte array) AND again with GetBuffer
>>>>> (recreate the
>>>>> array). So would be O (n) or even O (2*n) instead of O (0).
>>>>>
>>>>> It might be possible to create an implementation that can deal with
>>>>> all this
>>>>> (would need to have variable sized buffers, keep things it gets
>>>>> passed in
>>>>> the constructor alive with small overhead, etc.), but it will be quite
>>>>> complex and come with a large base overhead. And even then the
>>>>> GetBuffer
>>>>> O(n) problem remains in a few scenarios.
>>>>>
>>>>> Maybe it would be better to just leave the class as is and document
>>>>> that for
>>>>> certain scenarios alternative implementations are available that do
>>>>> a MUCH
>>>>> better job. Everybody can easily replace the use of MemoryStream
>>>>> with an
>>>>> alternative implementation if needed. But nobody expects this class to
>>>>> behave completely different from how it originally did (and seems
>>>>> to do in
>>>>> MS.Net).
>>>>>
>>>>> Andreas
>>>>>
>>>>>
>>>> _______________________________________________
>>>> Mono-devel-list mailing list
>>>> Mono-devel-list@...
>>>> http://lists.ximian.com/mailman/listinfo/mono-devel-list
>>>>
>>> _______________________________________________
>>> Mono-devel-list mailing list
>>> Mono-devel-list@...
>>> http://lists.ximian.com/mailman/listinfo/mono-devel-list
>>
>>
>> - Steve
>>
>> --------------
>> Steve G. Bjorg
>> http://mindtouch.com
>> http://twitter.com/bjorg
>> irc.freenode.net #mindtouch
>>
>>
>
>
_______________________________________________
Mono-devel-list mailing list
Mono-devel-list@...
http://lists.ximian.com/mailman/listinfo/mono-devel-list

Re: Should we replace MemoryStream?

by Robert Jordan :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Leszek Ciesielski wrote:
> Choice is not always good, and I think this is one of the cases when
> the default (i.e. the MemoryStream implementation) should make the
> choices instead presenting them to the user. Though I agree that the
> case of constructing a MemoryStream from an existing byte[] would
> require a special path in the code, as this is a stream that most
> likely won't be resized and in this case users are expecting the
> constructor to have a complexity of O(1) and GetBuffer to also be
> O(1). The same expectation is probably also true with a fixed size
> MemoryStream.

MemoryStream.GetBuffer's docs indirectly suggest that no copy
will be performed:

"Note that the buffer contains allocated bytes which might be unused.
For example, if the string "test" is written into the MemoryStream
object, the length of the buffer returned from GetBuffer is 256, not 4,
with 252 bytes unused. To obtain only the data in the buffer, use the
ToArray method; however, ToArray creates a copy of the data in memory."

So MemoryStream.GetBuffer must remain an O(1) operation in any case,
defeating any kind of optimization a chunked memory stream
implementation may introduce.

Robert

_______________________________________________
Mono-devel-list mailing list
Mono-devel-list@...
http://lists.ximian.com/mailman/listinfo/mono-devel-list

Re: Should we replace MemoryStream?

by Avery Pennarun-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Tue, Nov 10, 2009 at 8:48 AM, Robert Jordan <robertj@...> wrote:

> MemoryStream.GetBuffer's docs indirectly suggest that no copy
> will be performed:
>
> "Note that the buffer contains allocated bytes which might be unused.
> For example, if the string "test" is written into the MemoryStream
> object, the length of the buffer returned from GetBuffer is 256, not 4,
> with 252 bytes unused. To obtain only the data in the buffer, use the
> ToArray method; however, ToArray creates a copy of the data in memory."
>
> So MemoryStream.GetBuffer must remain an O(1) operation in any case,
> defeating any kind of optimization a chunked memory stream
> implementation may introduce.

Although this might be strictly true if you want to react exactly as
Microsoft's documentation claims (I thought 100% compatibility with
.Net was not the primary goal of mono?), there may be other options
that result in similar performance

For example, the first call to GetBuffer() could "coagulate" the
chunks into a single big array (perhaps with extra space at the end),
and then *keep that array*.  Subsequent calls to GetBuffer() could
avoid the copy.

In the event that your initial chunk wasn't big enough when pushing
data into the buffer in the first place, a non-chunked implementation
would have had to make an extra copy *anyway* at the time of the push.
 So in the chunked implementation, the extra copy on the first
GetBuffer() is actually not an *extra* copy at all vs. the naive
single-buffer implementation.

(I've written an efficient implementation of chunked buffering in C++,
and these were the conclusions I drew after a lot of benchmarking of
my library.  YMMV in C#, etc.)

Have fun,

Avery
_______________________________________________
Mono-devel-list mailing list
Mono-devel-list@...
http://lists.ximian.com/mailman/listinfo/mono-devel-list
< Prev | 1 - 2 | Next >