|
View:
New views
8 Messages
—
Rating Filter:
Alert me
|
|
|
Distributing thingsHello,
I'm trying to get my head around Erlang, and I've decided to try to take an existing program, and distribute it. I'm using code from Joe's Erlang book which generates a prime of X digits. On my laptop, anything over 500 digits takes a while. If you have one a single node running the code, it just loops, calculates, tests if it's a prime, and increments the number, and repeats if it's not a prime. It's the "is_prime" test that seems to take the time, so that's the part I want to distribute. I imagine if you have more than one node, you want to kick off "n" is_prime tests, but of different values. What is the best way of going about this? Is it to kick off n, and wait for all to return before kicking off another n? This isn't so good if one of the nodes takes a very long time to return, as the rest are doing nothing. Or have a "global" "next value" variable, and each node just gets the next one when it's finished its current one? What would be the best way of maintaining this "next value" variable? How can I handle additional nodes "coming on line" as it's running? How can I handle a node going offline in the middle of this? Would you all just use the existing nodes() nodelist to do this, or would you need to extend that in some way? Is there a good tried and tested method of deciding how to break down a large job of 1000000 small jobs into batches for n nodes? Hope these questions makes sense to you, C ________________________________________________________________ erlang-questions mailing list. See http://www.erlang.org/faq.html erlang-questions (at) erlang.org |
|
|
Re: Distributing thingsOn Sat, Nov 7, 2009 at 7:53 PM, Calum <caluml@...> wrote:
> Hello, > > I'm trying to get my head around Erlang, and I've decided to try to > take an existing program, and distribute it. > I'm using code from Joe's Erlang book which generates a prime of X > digits. On my laptop, anything over 500 digits takes a while. > > If you have one a single node running the code, it just loops, > calculates, tests if it's a prime, and increments the number, and > repeats if it's not a prime. > It's the "is_prime" test that seems to take the time, so that's the > part I want to distribute. Hang on - the algorithm just generates a random integer and tests if it is a prime, if it's not a prime it tries again. It has to generate many candidates before it finds a prime. Why not start N parallel processes each of which generates and tests candidates then when either one of them succeeds you can stop. Sounds easier to me. /Joe > I imagine if you have more than one node, you want to kick off "n" > is_prime tests, but of different values. > > What is the best way of going about this? Is it to kick off n, and > wait for all to return before kicking off another n? This isn't so > good if one of the nodes takes a very long time to return, as the rest > are doing nothing. > Or have a "global" "next value" variable, and each node just gets the > next one when it's finished its current one? > What would be the best way of maintaining this "next value" variable? > > How can I handle additional nodes "coming on line" as it's running? > How can I handle a node going offline in the middle of this? > Would you all just use the existing nodes() nodelist to do this, or > would you need to extend that in some way? > > Is there a good tried and tested method of deciding how to break down > a large job of 1000000 small jobs into batches for n nodes? > > > Hope these questions makes sense to you, > > C > > ________________________________________________________________ > erlang-questions mailing list. See http://www.erlang.org/faq.html > erlang-questions (at) erlang.org > > ________________________________________________________________ erlang-questions mailing list. See http://www.erlang.org/faq.html erlang-questions (at) erlang.org |
|
|
Re: Distributing thingsHi,
the scenario I imagined for this exercise, based on Calum's text would be something like assigning a range of numbers to each node, starting with the lowest possible given by the entry parameter (the # of digits), controlling its distribution and wrapping things up when a prime number is found by one of the nodes. An example: 4 nodes, generating a prime with 6 digits, would start at 100.000. The entry point would assign a range from 100.000 to 105.000 to node 1, 105.001 to 110.000 to node 2, and so on. When a node finishes traversing a range, a new block is assigned to it, until one of them returns a valid value, and processing stops. Maybe(?) this is a weird construction, but the idea is to explore parallel processing of series, rather than disconected calls from many clients. I'm new to Erlang too, just finished reading Joe's book (BTW, great work!!), and am now exploring possibilities. Don't know if this matches with Calum's idea, but it's something I would like to explore :-) Cheers, Fabio Mazanatti On Sat, Nov 7, 2009 at 6:13 PM, Joe Armstrong <erlang@...> wrote: > On Sat, Nov 7, 2009 at 7:53 PM, Calum <caluml@...> wrote: > > Hello, > > > > I'm trying to get my head around Erlang, and I've decided to try to > > take an existing program, and distribute it. > > I'm using code from Joe's Erlang book which generates a prime of X > > digits. On my laptop, anything over 500 digits takes a while. > > > > If you have one a single node running the code, it just loops, > > calculates, tests if it's a prime, and increments the number, and > > repeats if it's not a prime. > > It's the "is_prime" test that seems to take the time, so that's the > > part I want to distribute. > > Hang on - the algorithm just generates a random integer and tests if > it is a prime, > if it's not a prime it tries again. It has to generate many candidates > before it finds a > prime. Why not start N parallel processes each of which generates and > tests candidates > then when either one of them succeeds you can stop. Sounds easier to me. > > /Joe > > > > I imagine if you have more than one node, you want to kick off "n" > > is_prime tests, but of different values. > > > > What is the best way of going about this? Is it to kick off n, and > > wait for all to return before kicking off another n? This isn't so > > good if one of the nodes takes a very long time to return, as the rest > > are doing nothing. > > Or have a "global" "next value" variable, and each node just gets the > > next one when it's finished its current one? > > What would be the best way of maintaining this "next value" variable? > > > > How can I handle additional nodes "coming on line" as it's running? > > How can I handle a node going offline in the middle of this? > > Would you all just use the existing nodes() nodelist to do this, or > > would you need to extend that in some way? > > > > Is there a good tried and tested method of deciding how to break down > > a large job of 1000000 small jobs into batches for n nodes? > > > > > > Hope these questions makes sense to you, > > > > C > > > > ________________________________________________________________ > > erlang-questions mailing list. See http://www.erlang.org/faq.html > > erlang-questions (at) erlang.org > > > > > > ________________________________________________________________ > erlang-questions mailing list. See http://www.erlang.org/faq.html > erlang-questions (at) erlang.org > > |
|
|
Re: Distributing thingsOn Sat, Nov 7, 2009 at 8:13 PM, Joe Armstrong <erlang@...> wrote:
> Hang on - the algorithm just generates a random integer and tests if > it is a prime, > if it's not a prime it tries again. It has to generate many candidates > before it finds a > prime. Why not start N parallel processes each of which generates and > tests candidates > then when either one of them succeeds you can stop. Sounds easier to me. > > /Joe Heya Joe (crikey I feel honoured!) I thought it picked a random number, and then tested N-1 over and over to see if it was prime. I was trying to think how to coordinate the N value among all the processes so that they wouldn't try the same ones. I.e.: Main program selects 1234567 as a random number, and knows it has 3 Erlang nodes it can work on Node one tries 1234566 Node two tries 1234565 Node three tries 1234564 Then, when node one has found out that 1234566 isn't a prime (unsurprisingly!), it would have to know where the other nodes were at so it could know to check 1234563. Although I suppose it doesn't really matter (for this program) if they're all checking 123456* - but for some applications, they'd need to be in sync. The other thing I was wondering was what would happen if I fired up a new node in the middle of this running - how would it join in? Are there tried-and-tested software patterns to handle this sort of dynamic loading? C ________________________________________________________________ erlang-questions mailing list. See http://www.erlang.org/faq.html erlang-questions (at) erlang.org |
|
|
Re: Distributing thingsOn Sat, Nov 7, 2009 at 8:55 PM, Fabio Mazanatti
<fabio.mazanatti@...> wrote: > Hi, > > the scenario I imagined for this exercise, based on Calum's text would be > something like assigning a range of numbers to each node, starting with the > lowest possible given by the entry parameter (the # of digits), controlling > its distribution and wrapping things up when a prime number is found by one > of the nodes. > > An example: 4 nodes, generating a prime with 6 digits, would start at > 100.000. The entry point would assign a range from 100.000 to 105.000 to > node 1, 105.001 to 110.000 to node 2, and so on. When a node finishes > traversing a range, a new block is assigned to it, until one of them returns > a valid value, and processing stops. Hello Fabio, Yes, this is the sort of thing I would do for this problem. However, for other problems, perhaps the splitting method isn't so clearcut. Also, your method only really allows for 20 nodes - what would you do if you suddenly had 50 nodes available to join in? Actually, in the real world, do Erlang node clusters vary in size a lot during operation, or do most people know how many nodes they have available at the start, and just program for that? Is my attempting to cope with varied node numbers something that just isn't really needed? ________________________________________________________________ erlang-questions mailing list. See http://www.erlang.org/faq.html erlang-questions (at) erlang.org |
|
|
Re: Distributing thingsHi Calum,
as I said, my scenario may be weird, and indeed it is :-) But I can't see why it wouldn't use as many nodes as possible - I defined that the range interval was 5000, but could be any other number, fixed or derived from a calculation considering the # or digits and the # of active nodes, for instance. Anyway, my goal was to put an example of coordination and distribution. if each node works on a single number or a predefined range is not relevant - someone (a central module, for instance) has to control and pass the information along. You may know how many nodes will be available at production, but most of time you don't build your solution considering it. I'm working actively with Java EE, and always built applications targeting "clusters" (with any number of members), not a specific number of instances. If you need more power, you just add more nodes (or instances) to the environment. So, your idea of abstracting the number of actual nodes of a solution is the right way to go, at least to me :-) Ciao! On Sat, Nov 7, 2009 at 7:02 PM, Calum <caluml@...> wrote: > On Sat, Nov 7, 2009 at 8:55 PM, Fabio Mazanatti > <fabio.mazanatti@...> wrote: > > Hi, > > > > the scenario I imagined for this exercise, based on Calum's text would be > > something like assigning a range of numbers to each node, starting with > the > > lowest possible given by the entry parameter (the # of digits), > controlling > > its distribution and wrapping things up when a prime number is found by > one > > of the nodes. > > > > An example: 4 nodes, generating a prime with 6 digits, would start at > > 100.000. The entry point would assign a range from 100.000 to 105.000 to > > node 1, 105.001 to 110.000 to node 2, and so on. When a node finishes > > traversing a range, a new block is assigned to it, until one of them > returns > > a valid value, and processing stops. > > Hello Fabio, > > Yes, this is the sort of thing I would do for this problem. > However, for other problems, perhaps the splitting method isn't so > clearcut. > Also, your method only really allows for 20 nodes - what would you do > if you suddenly had 50 nodes available to join in? > > Actually, in the real world, do Erlang node clusters vary in size a > lot during operation, or do most people know how many nodes they have > available at the start, and just program for that? > Is my attempting to cope with varied node numbers something that just > isn't really needed? > |
|
|
Re: Distributing thingsOn Sat, Nov 07, 2009 at 09:02:39PM +0000, Calum wrote:
> On Sat, Nov 7, 2009 at 8:55 PM, Fabio Mazanatti > <fabio.mazanatti@...> wrote: > > Hi, > > > > the scenario I imagined for this exercise, based on Calum's text would be > > something like assigning a range of numbers to each node, starting with the > > lowest possible given by the entry parameter (the # of digits), controlling > > its distribution and wrapping things up when a prime number is found by one > > of the nodes. > > > > An example: 4 nodes, generating a prime with 6 digits, would start at > > 100.000. The entry point would assign a range from 100.000 to 105.000 to > > node 1, 105.001 to 110.000 to node 2, and so on. When a node finishes > > traversing a range, a new block is assigned to it, until one of them returns > > a valid value, and processing stops. > > Yes, this is the sort of thing I would do for this problem. > However, for other problems, perhaps the splitting method isn't so clearcut. So for these other problems, you would use a different method of distribution. For this problem, chunking the search space and distributing it to clients is a good method, so use it. "Right tool for the job" and all that. > Also, your method only really allows for 20 nodes - what would you do > if you suddenly had 50 nodes available to join in? Since the controlling process keeps control of the chunks, if more nodes join in mid-job, they just ask the controller for the next available chunk and go to it. > Actually, in the real world, do Erlang node clusters vary in size a > lot during operation, or do most people know how many nodes they have > available at the start, and just program for that? Honestly, I don't think that Erlang is the tool of choice for many big iron distributed computation problems. That's not to say that I don't think it's a good option (especially if you have more highly-optimised native code doing the number crunching, for maximum speed) but just that it isn't as widely used as the other, more well-known methods. > Is my attempting to cope with varied node numbers something that just > isn't really needed? It's your problem, not ours -- if it's necessary to solve your problem, then it's needed. Personally, I'd try the simplest thing that could possibly work first, which would be spawning a fixed number of processes to do the crunching, but if that was too slow in benchmarking (run a subset of the problem, time it, extrapolate) and, say, I wanted to start the search now but I had a pile of new hardware coming online next month, then I'd make sure that I could increase the number of available workers dynamically. - Matt ________________________________________________________________ erlang-questions mailing list. See http://www.erlang.org/faq.html erlang-questions (at) erlang.org |
|
|
Re: Distributing thingsOn Sat, Nov 7, 2009 at 9:57 PM, Calum <caluml@...> wrote:
> On Sat, Nov 7, 2009 at 8:13 PM, Joe Armstrong <erlang@...> wrote: >> Hang on - the algorithm just generates a random integer and tests if >> it is a prime, >> if it's not a prime it tries again. It has to generate many candidates >> before it finds a >> prime. Why not start N parallel processes each of which generates and >> tests candidates >> then when either one of them succeeds you can stop. Sounds easier to me. >> >> /Joe > > Heya Joe (crikey I feel honoured!) > > I thought it picked a random number, and then tested N-1 over and over > to see if it was prime. > I was trying to think how to coordinate the N value among all the > processes so that they wouldn't try the same ones. > I.e.: But the probability of that is 10^-N (N= number of digits) so for 80 digit primes 10^-80 > Main program selects 1234567 as a random number, and knows it has 3 > Erlang nodes it can work on > Node one tries 1234566 > Node two tries 1234565 > Node three tries 1234564 > Then, when node one has found out that 1234566 isn't a prime > (unsurprisingly!), it would have to know where the other nodes were at > so it could know to check 1234563. I'd just tell the N nodes "go generate random numbers and test if they are primes then tell me if you've got a prime" the chance they do the same work is very small and if they do who cares. The simplicity of the algorithm overweighs any potential inefficiency. Once you've got a prime tell all the nodes to stop calculating. > > Although I suppose it doesn't really matter (for this program) if > they're all checking 123456* - but for some applications, they'd need > to be in sync. > > The other thing I was wondering was what would happen if I fired up a > new node in the middle of this running - how would it join in? Are > there tried-and-tested software patterns to handle this sort of > dynamic loading? No - you could subscribe to node-up messages and fiddle with your algorithm /Joe > > C > ________________________________________________________________ erlang-questions mailing list. See http://www.erlang.org/faq.html erlang-questions (at) erlang.org |
| Free embeddable forum powered by Nabble | Forum Help |