|
View:
New views
12 Messages
—
Rating Filter:
Alert me
|
|
|
Erlang message passing delay after abnormal network disconnectionHi,
I am experiencing a high message passing delay between 2 Erlang nodes, after an abnormal network disconnection. Those 2 nodes are in a WAN and there are multiple Hubs, Switches, Routes, etc., in between them. If the message receiving Erlang node stopped gracefully, the delay doesn't arise. Doing net_adm:ping/1 to that node results no delay "pang". However gen_event:notify/2, gen_server:cast/2, etc. are waiting for about 10 seconds to return. What's the issue and how this can be avoided? Thanks, - Eranga _______________________________________________ erlang-questions mailing list erlang-questions@... http://www.erlang.org/mailman/listinfo/erlang-questions |
|
|
Re: Erlang message passing delay after abnormal network disconnectionOn 03/03/2008, Eranga Udesh <eranga.erl@...> wrote:
> Hi, > > I am experiencing a high message passing delay between 2 Erlang nodes, after > an abnormal network disconnection. Those 2 nodes are in a WAN and there are > multiple Hubs, Switches, Routes, etc., in between them. If the message > receiving Erlang node stopped gracefully, the delay doesn't arise. Doing > net_adm:ping/1 to that node results no delay "pang". However > gen_event:notify/2, gen_server:cast/2, etc. are waiting for about 10 seconds > to return. > > What's the issue and how this can be avoided? Have you tried putting a snoop to see whether the delay is on the sending/receiving side? This might be useful: http://www.erlang.org/contrib/erlsnoop-1.0.tgz cheers Chandru _______________________________________________ erlang-questions mailing list erlang-questions@... http://www.erlang.org/mailman/listinfo/erlang-questions |
|
|
Re: Erlang message passing delay after abnormal network disconnectionThe problem occurs when the network connectivity is broken (abnormally). The receiving node is not receiving messages. The sending processes are blocked, since those message delivery calls (gen_event:notify/s, etc) are waiting for about 10 secs to return. We checked the implementation of such calls and notice, the functions are waiting until the messages are delivered to the receiving node. Is there's a way (a system flag may be) to avoid such blocking and to return immediately?
BRgds, - Eranga On Mon, Mar 3, 2008 at 6:51 PM, Chandru <chandrashekhar.mullaparthi@...> wrote:
_______________________________________________ erlang-questions mailing list erlang-questions@... http://www.erlang.org/mailman/listinfo/erlang-questions |
|
|
Re: Erlang message passing delay after abnormal network disconnectionIt sounds as if the sending node is blocked in auto-connect. Try the kernel environment variable {dist_auto_connect, once}. It will ensure that any attempt to send to a disconnected node immediately fails. If one of the nodes restarts, they will automatically reconnect, as usual. You can explicitly connect the two nodes by calling net_kernel:connect(Node). BR, Ulf W Eranga Udesh skrev: > The problem occurs when the network connectivity is broken (abnormally). > The receiving node is not receiving messages. The sending processes are > blocked, since those message delivery calls (gen_event:notify/s, etc) > are waiting for about 10 secs to return. We checked the implementation > of such calls and notice, the functions are waiting until the messages > are delivered to the receiving node. Is there's a way (a system flag may > be) to avoid such blocking and to return immediately? > > BRgds, > - Eranga > > > > On Mon, Mar 3, 2008 at 6:51 PM, Chandru > <chandrashekhar.mullaparthi@... > <mailto:chandrashekhar.mullaparthi@...>> wrote: > > On 03/03/2008, Eranga Udesh <eranga.erl@... > <mailto:eranga.erl@...>> wrote: > > Hi, > > > > I am experiencing a high message passing delay between 2 Erlang > nodes, after > > an abnormal network disconnection. Those 2 nodes are in a WAN and > there are > > multiple Hubs, Switches, Routes, etc., in between them. If the > message > > receiving Erlang node stopped gracefully, the delay doesn't > arise. Doing > > net_adm:ping/1 to that node results no delay "pang". However > > gen_event:notify/2, gen_server:cast/2, etc. are waiting for about > 10 seconds > > to return. > > > > What's the issue and how this can be avoided? > > Have you tried putting a snoop to see whether the delay is on the > sending/receiving side? > > This might be useful: http://www.erlang.org/contrib/erlsnoop-1.0.tgz > > cheers > Chandru > > > > ------------------------------------------------------------------------ > > _______________________________________________ > erlang-questions mailing list > erlang-questions@... > http://www.erlang.org/mailman/listinfo/erlang-questions erlang-questions mailing list erlang-questions@... http://www.erlang.org/mailman/listinfo/erlang-questions |
|
|
Re: Erlang message passing delay after abnormal network disconnectionI can regenerate the behavior by stopping the network interface in the far node (linux ifdown). That runs the connected Erlang node, which was receiving the messages. I wonder if this how the Erlang implementation is or local to this particular setup.
Also I use HIPE. I'll try what you suggested below and also without HIPE. Thanks, - Eranga On Tue, Mar 4, 2008 at 2:08 PM, Ulf Wiger (TN/EAB) <ulf.wiger@...> wrote:
_______________________________________________ erlang-questions mailing list erlang-questions@... http://www.erlang.org/mailman/listinfo/erlang-questions |
|
|
Re: Erlang message passing delay after abnormal network disconnectionEranga Udesh wrote:
> I can regenerate the behavior by stopping the network interface in the > far node (linux ifdown). That runs the connected Erlang node, which was > receiving the messages. I wonder if this how the Erlang implementation > is or local to this particular setup. > > Also I use HIPE. I'll try what you suggested below and also without HIPE. Why would HiPE have some effect in what you are describing? Kostis _______________________________________________ erlang-questions mailing list erlang-questions@... http://www.erlang.org/mailman/listinfo/erlang-questions |
|
|
Re: Erlang message passing delay after abnormal network disconnectionWhen connectivity is broken abnormally the sending node will detect
this within 45-60 seconds as default. This can be changed with the net_tick_time environment variable in application kernel. Before the detection the sending node will try to send the message and if not possible it will be queued in the inet-driver. If the queue gets bigger than a certain max a so called "busy port" will occur which will block the sending Erlang process. This occurs when the receiving side of the distribution socket does not read what is sent to it which is the case when you have no connectivity. another scenario is that the receiving node is detected as down and an auto connect (including handshake) is performed for the first message sent after the broken connection. This will take in the order of 10 seconds before timeout. If you want to avoid this for a very crucial process (i.e avoid blocking of that particular Erlang process) you can send the message with erlang:send_nosuspend/2 or 3. Warning! these functions should be used with extreme care, Read the manual! Note that this has nothing to do with HiPE (i.e native code). An abnormal termination of the connectivity for example by unplugging the network cable will have this effect. /Kenneth Erlang/OTP team Ericsson On 3/4/08, Eranga Udesh <eranga.erl@...> wrote: > The problem occurs when the network connectivity is broken (abnormally). The > receiving node is not receiving messages. The sending processes are > blocked, since those message delivery calls (gen_event:notify/s, etc) are > waiting for about 10 secs to return. We checked the implementation of such > calls and notice, the functions are waiting until the messages are delivered > to the receiving node. Is there's a way (a system flag may be) to avoid such > blocking and to return immediately? > > BRgds, > - Eranga > > > > > On Mon, Mar 3, 2008 at 6:51 PM, Chandru > <chandrashekhar.mullaparthi@...> wrote: > > > > > > > > On 03/03/2008, Eranga Udesh <eranga.erl@...> wrote: > > > Hi, > > > > > > I am experiencing a high message passing delay between 2 Erlang nodes, > after > > > an abnormal network disconnection. Those 2 nodes are in a WAN and there > are > > > multiple Hubs, Switches, Routes, etc., in between them. If the message > > > receiving Erlang node stopped gracefully, the delay doesn't arise. Doing > > > net_adm:ping/1 to that node results no delay "pang". However > > > gen_event:notify/2, gen_server:cast/2, etc. are waiting for about 10 > seconds > > > to return. > > > > > > What's the issue and how this can be avoided? > > > > Have you tried putting a snoop to see whether the delay is on the > > sending/receiving side? > > > > This might be useful: > http://www.erlang.org/contrib/erlsnoop-1.0.tgz > > > > cheers > > Chandru > > > > > _______________________________________________ > erlang-questions mailing list > erlang-questions@... > http://www.erlang.org/mailman/listinfo/erlang-questions > erlang-questions mailing list erlang-questions@... http://www.erlang.org/mailman/listinfo/erlang-questions |
|
|
Re: Erlang message passing delay after abnormal network disconnectionThanks for the info and it makes sense.
In "busy port" situation, do queued messages get discarded if the queue grows beyond a max? Is it FIFO or LIFO? Is there a way to configure this message queue size? Can one inet_drv "busy port" block all other connected (live) node communication? As I said before the net_adm:ping/1 returns "pang" immediately. Then why doesn't the message delivery function identify that the remote node is inaccessible, hence return immediately with an error? How the message delivery method implemented in Erlang? Is it to return as soon as the message is handed over to the local inet_drv or delivered to the receiving Erlang node's inet_drv and after receiving a confirmation or something? - Eranga On Tue, Mar 4, 2008 at 10:28 PM, Kenneth Lundin <kenneth.lundin@...> wrote: When connectivity is broken abnormally the sending node will detect _______________________________________________ erlang-questions mailing list erlang-questions@... http://www.erlang.org/mailman/listinfo/erlang-questions |
|
|
Re: Erlang message passing delay after abnormal network disconnectionIt's just a guess. When compiled natively I've found some problems time to time,
- garbage collection is not working well. Old heap is kept unnecessarily. I've written this to the list once before. - code loading crashes a running process more often Probably it's nothing to do with HIPE, but I thought to simulate the same without HIPE and check. - Eranga On Tue, Mar 4, 2008 at 8:55 PM, Kostis Sagonas <kostis@...> wrote:
_______________________________________________ erlang-questions mailing list erlang-questions@... http://www.erlang.org/mailman/listinfo/erlang-questions |
|
|
Re: Erlang message passing delay after abnormal network disconnectionHi, everyone. I've read forward in the thread ... and am wondering if
there's a simpler cause? Since the default distribution mechanism rides on top of TCP, the delay might be caused by TCP's exponential back-off when packet loss is encountered? A quick packet capture could verify this theory: there would be a big delay after the network partition is fixed (i.e. plug cable back in, "ifconfig {IFACE} up", whatever) and before the next packet (in either direction) is transmitted. -Scott _______________________________________________ erlang-questions mailing list erlang-questions@... http://www.erlang.org/mailman/listinfo/erlang-questions |
|
|
Re: Erlang message passing delay after abnormal network disconnectionThe problem I am talking about occurs while the network is in partitioned condition. When the network connection is re-established and the Erlang node is connected with a net_adm:ping/1 the message queue drains out quickly and the nodes start working normal.
As I said before, this delay occurs only after an abnormal network disconnection. If the receiving Erlang node is shutdown gracefully, the message delay doesn't occur. I doubt, this occurs only when the packets sent out are going to a black-hole and nobody responds that the destination TCP entity is unavailable. - Eranga On Wed, Mar 5, 2008 at 12:21 AM, Scott Lystig Fritchie <fritchie@...> wrote: Hi, everyone. I've read forward in the thread ... and am wondering if _______________________________________________ erlang-questions mailing list erlang-questions@... http://www.erlang.org/mailman/listinfo/erlang-questions |
|
|
Re: Erlang message passing delay after abnormal network disconnectionExcellent, the
net_tick_time environment variable works. Thanks for the advice.
Still I appreciate if I can know the behavior of inet_drv based on the questions I asked in my previous email. Cheers, - Eranga On Tue, Mar 4, 2008 at 10:28 PM, Kenneth Lundin <kenneth.lundin@...> wrote: When connectivity is broken abnormally the sending node will detect _______________________________________________ erlang-questions mailing list erlang-questions@... http://www.erlang.org/mailman/listinfo/erlang-questions |
| Free embeddable forum powered by Nabble | Forum Help |