memcached and streaming

View: New views
8 Messages — Rating Filter:   Alert me  

memcached and streaming

by Tomash Brechko :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hello,

In memcached.c there's a line,

  setsockopt(sfd, IPPROTO_TCP, TCP_NODELAY, (void *)&flags, sizeof(flags));

We should realise that this is a serious performance killer for
streaming.  For instance, C::M::F has multi-update commands that are
capable to bundle several requests into one network packet.  However
replies are always sent _one at a time_.  I.e. tons of packets with
single 'STORED' etc. (>40 bytes header and 6 bytes of data---not the
best ratio).  Not to say the the greater number of packets means a
huge increase of latency in a congested network.

memcached should have in its TODO "Disable TCP_NODELAY and enable
TCP_CORK/TCP_NOPUSH for streaming".  The server may tell if the
streaming is used either by extending the protocol to support new
keyword, 'more' (specified for all commands but last; my experiments
with norepy keyword show that "modifying the parser would kill the
performance" are just evil lies ;)).  Or it may use automatic
heuristic: if it parsed a command, but there's more data in the input
buffer, then it doesn't have to push the reply until it process that
data too.  Or both.

get with 100 keys has about 3x smaller wallclock time than 100 single
key gets bundled together, on otherwise free network (and even about
2x on loopback).


--
   Tomash Brechko

Re: memcached and streaming

by Maxim Dounin :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hello!

On Sun, Feb 03, 2008 at 05:32:16PM +0300, Tomash Brechko wrote:

>Hello,
>
>In memcached.c there's a line,
>
>  setsockopt(sfd, IPPROTO_TCP, TCP_NODELAY, (void *)&flags, sizeof(flags));
>
>We should realise that this is a serious performance killer for
>streaming.  For instance, C::M::F has multi-update commands that are
>capable to bundle several requests into one network packet.  However
>replies are always sent _one at a time_.  I.e. tons of packets with
>single 'STORED' etc. (>40 bytes header and 6 bytes of data---not the
>best ratio).  Not to say the the greater number of packets means a
>huge increase of latency in a congested network.
>
>memcached should have in its TODO "Disable TCP_NODELAY and enable
>TCP_CORK/TCP_NOPUSH for streaming".  The server may tell if the

Just a quick note: the code to set/clear TCP_NOPUSH was here
before iov work was merged from facebook at r320 (i.e. in 1.1.*
branch).

Maxim Dounin

Re: memcached and streaming

by Dormando :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

>>
>> We should realise that this is a serious performance killer for
>> streaming.  For instance, C::M::F has multi-update commands that are
>> capable to bundle several requests into one network packet.  However
>> replies are always sent _one at a time_.  I.e. tons of packets with
>> single 'STORED' etc. (>40 bytes header and 6 bytes of data---not the
>> best ratio).  Not to say the the greater number of packets means a
>> huge increase of latency in a congested network.
>>
>> memcached should have in its TODO "Disable TCP_NODELAY and enable
>> TCP_CORK/TCP_NOPUSH for streaming".  The server may tell if the
>
> Just a quick note: the code to set/clear TCP_NOPUSH was here before iov
> work was merged from facebook at r320 (i.e. in 1.1.* branch).
>
> Maxim Dounin

Yeah; I want to rewrite the way the iov code works... It's not flexible
enough, relies on mallocing lists _per conn structure_ instead of a
linked list, and in certain conditions uses too many packets.

Under an mget response the iov stuff works great. Otherwise it's
probably still fine since most people don't stream changes. It'd be nice
to handle all cases better, so I can remove things like the text
protocol CAS hack, and for the few folks who stream changes around.

-Dormando

Re: memcached and streaming

by Tomash Brechko :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Sun, Feb 03, 2008 at 19:39:59 -0800, dormando wrote:
> > Just a quick note: the code to set/clear TCP_NOPUSH was here before iov
> > work was merged from facebook at r320 (i.e. in 1.1.* branch).

Yep, thanks, I've seen that changeset too.  But I had an impression
that while having TCP_NOPUSH it did something else with it (like
switching to it during a single reply, because then each chunk was
sent separately, there was no iov stuff).


> Yeah; I want to rewrite the way the iov code works... It's not flexible
> enough, relies on mallocing lists _per conn structure_ instead of a
> linked list, and in certain conditions uses too many packets.

Well, you may do that, but this has nothing to do with my proposal.  I
simply suggest to switch between TCP_NODELAY and TCP_CORK/TCP_NOPUSH
depending on the situation.


> Under an mget response the iov stuff works great. Otherwise it's
> probably still fine since most people don't stream changes.

...And binary protocol got rid of mget, because everyone is supposed
to go streaming ;).


It probably would be cleaner to just implement it myself and show the
patch, but I have no time for that right now, only can talk big :).


--
   Tomash Brechko

Re: memcached and streaming

by Dormando :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Tomash Brechko wrote:

> On Sun, Feb 03, 2008 at 19:39:59 -0800, dormando wrote:
>>> Just a quick note: the code to set/clear TCP_NOPUSH was here before iov
>>> work was merged from facebook at r320 (i.e. in 1.1.* branch).
>
> Yep, thanks, I've seen that changeset too.  But I had an impression
> that while having TCP_NOPUSH it did something else with it (like
> switching to it during a single reply, because then each chunk was
> sent separately, there was no iov stuff).
>
>
>> Yeah; I want to rewrite the way the iov code works... It's not flexible
>> enough, relies on mallocing lists _per conn structure_ instead of a
>> linked list, and in certain conditions uses too many packets.
>
> Well, you may do that, but this has nothing to do with my proposal.  I
> simply suggest to switch between TCP_NODELAY and TCP_CORK/TCP_NOPUSH
> depending on the situation.

I think it'd be easier to do that if the iov stuff was refactored a bit,
partially because...

>
> ...And binary protocol got rid of mget, because everyone is supposed
> to go streaming ;).

... we'd need to use it to refactor the binary tree to not write single
packets for mget. This is one of my major concerns for the binary tree
as-is, but I've had no time to really figure it out. While cork/nopush
should give us the desired effect of coalescing packets, we'll still
save significant performance (especially on the BSD's) if we can combine
the syscalls down some.

>
> It probably would be cleaner to just implement it myself and show the
> patch, but I have no time for that right now, only can talk big :).
>

Wank ;)

-Dormando


Re: memcached and streaming

by Dustin Sallings :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


On Feb 3, 2008, at 23:05, dormando wrote:

>> ...And binary protocol got rid of mget, because everyone is supposed
>> to go streaming ;).
>
> ... we'd need to use it to refactor the binary tree to not write  
> single
> packets for mget. This is one of my major concerns for the binary tree
> as-is, but I've had no time to really figure it out. While cork/nopush
> should give us the desired effect of coalescing packets, we'll still
> save significant performance (especially on the BSD's) if we can  
> combine
> the syscalls down some.


        I'm using all the iov stuff assuming it will do the right thing.  
What it does not do, however, is explicitly state that the server  
should stop and then start transmitting.

        Basically, the reception of the first silent get should cork, and the  
next command that isn't silent should uncork.

--
Dustin Sallings




Re: memcached and streaming

by Dormando :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Dustin Sallings wrote:

>
> On Feb 3, 2008, at 23:05, dormando wrote:
>
>>> ...And binary protocol got rid of mget, because everyone is supposed
>>> to go streaming ;).
>>
>> ... we'd need to use it to refactor the binary tree to not write single
>> packets for mget. This is one of my major concerns for the binary tree
>> as-is, but I've had no time to really figure it out. While cork/nopush
>> should give us the desired effect of coalescing packets, we'll still
>> save significant performance (especially on the BSD's) if we can combine
>> the syscalls down some.
>
>
>     I'm using all the iov stuff assuming it will do the right thing.
> What it does not do, however, is explicitly state that the server should
> stop and then start transmitting.
>
>     Basically, the reception of the first silent get should cork, and
> the next command that isn't silent should uncork.
>

Yup. Ah shit, I should just cram through that this week. I'll ping back
with some updates.

So I think the plan is to toss out some -rc's on wednesday. 1.2.5-rc and
1.3.0-rc, with the two diverging a little bit until we can confirm the
streaming performance of the binary tree is acceptable.

Then the two can be quickly merged and Dustin and Tomash given medals of
honor and purple hearts for the amount of pain it's all gone through.

-Dormando

Re: memcached and streaming

by Dustin Sallings :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


   That'd be great.  I have one more patch I submitted and have been  
waiting to go through.  Once that's committed, I can't think of any  
more protocol changed and would like to release my 2.0 of my client.

--
Dustin Sallings (mobile)

On Feb 4, 2008, at 0:04, dormando <dormando@...> wrote:

> Dustin Sallings wrote:
>>
>> On Feb 3, 2008, at 23:05, dormando wrote:
>>
>>>> ...And binary protocol got rid of mget, because everyone is  
>>>> supposed
>>>> to go streaming ;).
>>>
>>> ... we'd need to use it to refactor the binary tree to not write  
>>> single
>>> packets for mget. This is one of my major concerns for the binary  
>>> tree
>>> as-is, but I've had no time to really figure it out. While cork/
>>> nopush
>>> should give us the desired effect of coalescing packets, we'll still
>>> save significant performance (especially on the BSD's) if we can  
>>> combine
>>> the syscalls down some.
>>
>>
>>    I'm using all the iov stuff assuming it will do the right thing.
>> What it does not do, however, is explicitly state that the server  
>> should
>> stop and then start transmitting.
>>
>>    Basically, the reception of the first silent get should cork, and
>> the next command that isn't silent should uncork.
>>
>
> Yup. Ah shit, I should just cram through that this week. I'll ping  
> back
> with some updates.
>
> So I think the plan is to toss out some -rc's on wednesday. 1.2.5-rc  
> and
> 1.3.0-rc, with the two diverging a little bit until we can confirm the
> streaming performance of the binary tree is acceptable.
>
> Then the two can be quickly merged and Dustin and Tomash given  
> medals of
> honor and purple hearts for the amount of pain it's all gone through.
>
> -Dormando
>
>