MPI for ns-3

View: New views
9 Messages — Rating Filter:   Alert me  

Parent Message unknown MPI for ns-3

by Josh Pelkey :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi all,

We have the distributed MPI code ready for review.  Below is an
overview of the code and the changes made.

http://codereview.appspot.com/109068/show

===
Overview [updated 2 Nov 2009].  The distributed simulation code is
designed to allow a single topology to be split into "n"
sub-topologies and executed on "n" linux boxes in a distributed
memory environment.  The current implementation only allows packets
sent across point-to-point links to be sent across simulator
boundaries; the next release will support wireless we hope.  
We tried to change as little as possible of the existing code base
and isolate our changes in separate modules where possible.

Details.
point-to-point-channel.[h,cc]  - Made TransmitStart virtual so the
new subclass (see below) can override.

point-to-point-remote-channel.[h,cc] - new subclass of p2p channel;
used to connect simulated nodes on one simulator to nodes on a
separate simulator.  Overrides the TransmitStart function and
uses MPI to send the packet across simulator processes.

point-to-point-helper.[h,cc].  Checks the system id on the two nodes
being connected, and uses a p2p-remote-channel rather than a p2p-channel
if the system id's do not match.

node.[h,cc] - Added a "SetSystemId" function, to allow the system id to
be set after the node is constructed, rather than passing parameter
to constructor; this exists because there was no way to use
non-default constructors on object creation.  I believe this might have
been fixed since then, so perhaps this might not be neded.

global-route-manager.cc - Do not compute routes for any node whose system-id
does not match the simulator's system id.  Clearly, we don't need to
compute routes in simulator A for nodes modeled in simulator B.

default-simulator-impl.[h.cc] = Added virtual "GetSystemId" method,
which returns the system id (MPI rank) of this process.   For
the default-simulator object it always returns zero.

distributed-simulator-impl.[h.cc] - new subclass of default-simulator-impl.
THe meat of this is the overridden "run" method that handles all of
the time management needed, using MPI.

mpi-interface.[h,cc] - All MPI related processing is here in a single
spot, rather than across several modules.

buffer.[h,cc] - Added a "CreateLimitedCopy" function which is used when
serializing packets for MPI.  CreateLimitedCopy does not copy into the buffer
the zero byte data of a packet; it only includes the number of bytes of zero
byte data.

packet.[h,cc] - Added new constructor for rebuilding packets that cross
simulator boundaries.  Also added serialization/deserialization for MPI.  
The formato for this is shown in the function comments.

packet-metadata.[h,cc] - Added a rank id to the packet-metadata which corresponds to
the LP that the packet belongs.  A new constructor is also available which creates
packet-metadata with a specific rank.  This is necessary for when packets cross simulator
boundaries, and the packet (and metadata) must be rebuilt.
===

I also have two issues I'd like to discuss here about the MPI code.  

1) Enabling mpi using --with-mpi switch
I have modified the wscript to accept a --with-mpi switch to compile
with NS3_MPI included in CXXDEFINES.  In order to compile correctly,
the CXX environment variable must also be set to the path of mpic++.  
Right now I force the user to do this manually.  For example, if they
choose the --with-mpi switch, but don't have CXX set appropriately,
WAF will exit and alert the user of the issue.  Is there a way to set
this environment variable automatically if --with-mpi is chosen and the
mpic++ program is found?  

2) MPI examples in examples/mpi/
These examples are not built automatically through WAF, as using MPI
requires some additional setup.  The examples must also be run with
mpirun, rather than WAF.  A README file exists in this directory
which explains what must be done in order to run the MPI examples.  
I couldn't think of a good way to do this automatically, so that's
why the examples are not built with WAF but a Makefile in their
directory.

Thanks,
George Riley and Josh Pelkey

Re: MPI for ns-3

by Faker Moatamri :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

jpelkey@... wrote:
> Hi all,
>
> We have the distributed MPI code ready for review.  Below is an
> overview of the code and the changes made.
>
> http://codereview.appspot.com/109068/show
>
>  
I did a review to your code, please find my messages in

http://codereview.appspot.com/109068/show

> I also have two issues I'd like to discuss here about the MPI code.  
>
> 1) Enabling mpi using --with-mpi switch
> I have modified the wscript to accept a --with-mpi switch to compile
> with NS3_MPI included in CXXDEFINES.  In order to compile correctly,
> the CXX environment variable must also be set to the path of mpic++.  
> Right now I force the user to do this manually.  For example, if they
> choose the --with-mpi switch, but don't have CXX set appropriately,
> WAF will exit and alert the user of the issue.  Is there a way to set
> this environment variable automatically if --with-mpi is chosen and the
> mpic++ program is found?  
> 2) MPI examples in examples/mpi/
> These examples are not built automatically through WAF, as using MPI
> requires some additional setup.  The examples must also be run with
> mpirun, rather than WAF.  A README file exists in this directory
> which explains what must be done in order to run the MPI examples.  
> I couldn't think of a good way to do this automatically, so that's
> why the examples are not built with WAF but a Makefile in their
> directory.
>  
Gustavo, can you please give him an answer (if possible) for those two
questions?
> Thanks,
> George Riley and Josh Pelkey
>  


Re: MPI for ns-3

by Gustavo Carneiro :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

2009/11/4 <jpelkey@...>

> Hi all,
>
> We have the distributed MPI code ready for review.  Below is an
> overview of the code and the changes made.
>
> http://codereview.appspot.com/109068/show
>
> ===
> Overview [updated 2 Nov 2009].  The distributed simulation code is
> designed to allow a single topology to be split into "n"
> sub-topologies and executed on "n" linux boxes in a distributed
> memory environment.  The current implementation only allows packets
> sent across point-to-point links to be sent across simulator
> boundaries; the next release will support wireless we hope.
> We tried to change as little as possible of the existing code base
> and isolate our changes in separate modules where possible.
>
> Details.
> point-to-point-channel.[h,cc]  - Made TransmitStart virtual so the
> new subclass (see below) can override.
>
> point-to-point-remote-channel.[h,cc] - new subclass of p2p channel;
> used to connect simulated nodes on one simulator to nodes on a
> separate simulator.  Overrides the TransmitStart function and
> uses MPI to send the packet across simulator processes.
>
> point-to-point-helper.[h,cc].  Checks the system id on the two nodes
> being connected, and uses a p2p-remote-channel rather than a p2p-channel
> if the system id's do not match.
>
> node.[h,cc] - Added a "SetSystemId" function, to allow the system id to
> be set after the node is constructed, rather than passing parameter
> to constructor; this exists because there was no way to use
> non-default constructors on object creation.  I believe this might have
> been fixed since then, so perhaps this might not be neded.
>
> global-route-manager.cc - Do not compute routes for any node whose
> system-id
> does not match the simulator's system id.  Clearly, we don't need to
> compute routes in simulator A for nodes modeled in simulator B.
>
> default-simulator-impl.[h.cc] = Added virtual "GetSystemId" method,
> which returns the system id (MPI rank) of this process.   For
> the default-simulator object it always returns zero.
>
> distributed-simulator-impl.[h.cc] - new subclass of default-simulator-impl.
> THe meat of this is the overridden "run" method that handles all of
> the time management needed, using MPI.
>
> mpi-interface.[h,cc] - All MPI related processing is here in a single
> spot, rather than across several modules.
>
> buffer.[h,cc] - Added a "CreateLimitedCopy" function which is used when
> serializing packets for MPI.  CreateLimitedCopy does not copy into the
> buffer
> the zero byte data of a packet; it only includes the number of bytes of
> zero
> byte data.
>
> packet.[h,cc] - Added new constructor for rebuilding packets that cross
> simulator boundaries.  Also added serialization/deserialization for MPI.
> The formato for this is shown in the function comments.
>
> packet-metadata.[h,cc] - Added a rank id to the packet-metadata which
> corresponds to
> the LP that the packet belongs.  A new constructor is also available which
> creates
> packet-metadata with a specific rank.  This is necessary for when packets
> cross simulator
> boundaries, and the packet (and metadata) must be rebuilt.
> ===
>
> I also have two issues I'd like to discuss here about the MPI code.
>
> 1) Enabling mpi using --with-mpi switch
> I have modified the wscript to accept a --with-mpi switch to compile
> with NS3_MPI included in CXXDEFINES.  In order to compile correctly,
> the CXX environment variable must also be set to the path of mpic++.
> Right now I force the user to do this manually.  For example, if they
> choose the --with-mpi switch, but don't have CXX set appropriately,
> WAF will exit and alert the user of the issue.  Is there a way to set
> this environment variable automatically if --with-mpi is chosen and the
> mpic++ program is found?
>

What does this mpic++ do?  Do you have to compile every source file with it,
or just selected source files?

You can modify CXX on a per "taskgen" basis, for example:

    obj = bld.create_ns3_program('csma-bridge', ['bridge', 'csma',
'internet-stack'])
    obj.source = 'csma-bridge.cc'
    obj.env['CXX'] = '/usr/bin/mpic++'  # <<< added

To make this conditional, run a configure check:

def configure(conf):
     conf.find_program('mpic++', var='MPICXX')

def build(bld):

   obj = bld.create_task_gen(...)
   if obj.env['MIPCXX']:
       obj.env['CXX'] = 'MPICXX'


Or something like this...


>
> 2) MPI examples in examples/mpi/
> These examples are not built automatically through WAF, as using MPI
> requires some additional setup.  The examples must also be run with
> mpirun, rather than WAF.  A README file exists in this directory
> which explains what must be done in order to run the MPI examples.
> I couldn't think of a good way to do this automatically, so that's
> why the examples are not built with WAF but a Makefile in their
> directory.
>

I can't imagine why building with WAF is harder than with Makefile.  Surely
we have example code already using external libraries, such as gtk+2 or
sqlite.  Can you be more specific on what you don't know how to do?

As for running with mpirun, I suppose we could modify waf --run to call
mpirun when mpi is enabled.  If you are unable to do it, give me a mercurial
branch URL (I need code that I can build and extend, none of this review
stuff) and I'll post a patch.  Anyway, the modifications should be similar
to what is already done with the --valgrind option, which modifies --run to
run via valgrind.

Hope this helps.

Regards,

--
Gustavo J. A. M. Carneiro
INESC Porto, Telecommunications and Multimedia Unit
"The universe is always one step beyond logic." -- Frank Herbert

Re: MPI for ns-3

by Josh Pelkey :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Thanks for the comments Faker and Gustavo.  I'll be fixing up the code more
over the next few days and will re-post.  Gustavo, I will try to answer your
questions below.

> What does this mpic++ do?

mpic++ is simply a wrapper around the underlying C++ compiler.  Open MPI
programs require some MPI specific libraries as well as some header files
which may not be in a standard location.  In short, it is easiest to just
compile everything with mpic++ when --with-mpi is selected.  The main issue
I was having is automatically setting the CXX environment variable if
--with-mpi was selected and the mpic++ program was found.  I do suppose it
would be possible to individually select which files should be compiled with
mpic++, but again, I think it is just easiest to compile everything when
--with-mpi is selected, since this will not affect non-MPI code in any way.

> I can't imagine why building with WAF is harder than with Makefile.

You are probably right here.  I am just unsure of how to do this.  There are
two separate issues.  First, the LD_LIBRARY_PATH needs to be set to the path
of the ns-3 build directory before the examples are run with mpirun.
Otherwise, mpirun can't find libns3.so.  So I guess this is another case
where I am wondering if we can automatically set environment variables when
--with-mpi is selected.

The second issue is mpirun.  I know you mentioned that this could be done in
much the same way as valgrind, but again, I am unfamiliar with this
process.  I'll just explain what needs to happen first.  mpirun is expecting
a number of command line arguments including the number of logical
processors (among other things).  So I guess the question is, can this be
built into WAF?  Something like ./waf --run --mpirun -np 2
./test-distributed --NIX=0.  In this case, the "np" argument is for the
number of logical processors, and must be passed into mpirun.
"test-distributed" is the user script that must be run with mpirun,
accepting its own command line arguments.  I imagine this is very possible,
but I'm just not sure how to do it.

Also, my mercurial branch is here:
http://code.nsnam.org/jpelkey3/ns-3-distributed/

Thanks again,
Josh Pelkey

On Fri, Nov 6, 2009 at 10:12 AM, Gustavo Carneiro <gjcarneiro@...>wrote:

>
>
> 2009/11/4 <jpelkey@...>
>
> Hi all,
>>
>> We have the distributed MPI code ready for review.  Below is an
>> overview of the code and the changes made.
>>
>> http://codereview.appspot.com/109068/show
>>
>> ===
>> Overview [updated 2 Nov 2009].  The distributed simulation code is
>> designed to allow a single topology to be split into "n"
>> sub-topologies and executed on "n" linux boxes in a distributed
>> memory environment.  The current implementation only allows packets
>> sent across point-to-point links to be sent across simulator
>> boundaries; the next release will support wireless we hope.
>> We tried to change as little as possible of the existing code base
>> and isolate our changes in separate modules where possible.
>>
>> Details.
>> point-to-point-channel.[h,cc]  - Made TransmitStart virtual so the
>> new subclass (see below) can override.
>>
>> point-to-point-remote-channel.[h,cc] - new subclass of p2p channel;
>> used to connect simulated nodes on one simulator to nodes on a
>> separate simulator.  Overrides the TransmitStart function and
>> uses MPI to send the packet across simulator processes.
>>
>> point-to-point-helper.[h,cc].  Checks the system id on the two nodes
>> being connected, and uses a p2p-remote-channel rather than a p2p-channel
>> if the system id's do not match.
>>
>> node.[h,cc] - Added a "SetSystemId" function, to allow the system id to
>> be set after the node is constructed, rather than passing parameter
>> to constructor; this exists because there was no way to use
>> non-default constructors on object creation.  I believe this might have
>> been fixed since then, so perhaps this might not be neded.
>>
>> global-route-manager.cc - Do not compute routes for any node whose
>> system-id
>> does not match the simulator's system id.  Clearly, we don't need to
>> compute routes in simulator A for nodes modeled in simulator B.
>>
>> default-simulator-impl.[h.cc] = Added virtual "GetSystemId" method,
>> which returns the system id (MPI rank) of this process.   For
>> the default-simulator object it always returns zero.
>>
>> distributed-simulator-impl.[h.cc] - new subclass of
>> default-simulator-impl.
>> THe meat of this is the overridden "run" method that handles all of
>> the time management needed, using MPI.
>>
>> mpi-interface.[h,cc] - All MPI related processing is here in a single
>> spot, rather than across several modules.
>>
>> buffer.[h,cc] - Added a "CreateLimitedCopy" function which is used when
>> serializing packets for MPI.  CreateLimitedCopy does not copy into the
>> buffer
>> the zero byte data of a packet; it only includes the number of bytes of
>> zero
>> byte data.
>>
>> packet.[h,cc] - Added new constructor for rebuilding packets that cross
>> simulator boundaries.  Also added serialization/deserialization for MPI.
>> The formato for this is shown in the function comments.
>>
>> packet-metadata.[h,cc] - Added a rank id to the packet-metadata which
>> corresponds to
>> the LP that the packet belongs.  A new constructor is also available which
>> creates
>> packet-metadata with a specific rank.  This is necessary for when packets
>> cross simulator
>> boundaries, and the packet (and metadata) must be rebuilt.
>> ===
>>
>> I also have two issues I'd like to discuss here about the MPI code.
>>
>> 1) Enabling mpi using --with-mpi switch
>> I have modified the wscript to accept a --with-mpi switch to compile
>> with NS3_MPI included in CXXDEFINES.  In order to compile correctly,
>> the CXX environment variable must also be set to the path of mpic++.
>> Right now I force the user to do this manually.  For example, if they
>> choose the --with-mpi switch, but don't have CXX set appropriately,
>> WAF will exit and alert the user of the issue.  Is there a way to set
>> this environment variable automatically if --with-mpi is chosen and the
>> mpic++ program is found?
>>
>
> What does this mpic++ do?  Do you have to compile every source file with
> it, or just selected source files?
>
> You can modify CXX on a per "taskgen" basis, for example:
>
>     obj = bld.create_ns3_program('csma-bridge', ['bridge', 'csma',
> 'internet-stack'])
>     obj.source = 'csma-bridge.cc'
>     obj.env['CXX'] = '/usr/bin/mpic++'  # <<< added
>
> To make this conditional, run a configure check:
>
> def configure(conf):
>      conf.find_program('mpic++', var='MPICXX')
>
> def build(bld):
>
>    obj = bld.create_task_gen(...)
>    if obj.env['MIPCXX']:
>        obj.env['CXX'] = 'MPICXX'
>
>
> Or something like this...
>
>
>>
>> 2) MPI examples in examples/mpi/
>> These examples are not built automatically through WAF, as using MPI
>> requires some additional setup.  The examples must also be run with
>> mpirun, rather than WAF.  A README file exists in this directory
>> which explains what must be done in order to run the MPI examples.
>> I couldn't think of a good way to do this automatically, so that's
>> why the examples are not built with WAF but a Makefile in their
>> directory.
>>
>
> I can't imagine why building with WAF is harder than with Makefile.  Surely
> we have example code already using external libraries, such as gtk+2 or
> sqlite.  Can you be more specific on what you don't know how to do?
>
> As for running with mpirun, I suppose we could modify waf --run to call
> mpirun when mpi is enabled.  If you are unable to do it, give me a mercurial
> branch URL (I need code that I can build and extend, none of this review
> stuff) and I'll post a patch.  Anyway, the modifications should be similar
> to what is already done with the --valgrind option, which modifies --run to
> run via valgrind.
>
> Hope this helps.
>
> Regards,
>
> --
> Gustavo J. A. M. Carneiro
> INESC Porto, Telecommunications and Multimedia Unit
> "The universe is always one step beyond logic." -- Frank Herbert
>

Re: MPI for ns-3

by craigdo :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi,

FYI, there is a conflict on Fedora systems between libotf and openmpi.  Both
provide a "%{-bindir}/otfdump".  One OTF is apparently OpenTypeFont from
emacs and the other is OpenTraceFormat from openmpi.

If you "yum remove libotf" to get rid of the pre-existing /usr/bin/otfdump,
it removes emacs where the other dependency lies.

Some googling reveals that a new version of openmpi will fix this; but until
that is released (I can't find one), a workaround seems to be:

1) If you care, rename the tiny otfdump which emacs says it needs (but I
don't know what obscure thing uses it):

   mv /usr/bin/otfdump /usr/bin/otfdump.emacs-version

2) Manually resolve openmpi dependencies:

   sudo yum install libgfortran libtorque numactl

3) Download rpm packages,

    openmpi-1.3.1-1.fc11.i586.rpm
    openmpi-devel-1.3.1-1.fc11.i586.rpm
    openmpi-libs-1.3.1-1.fc11.i586.rpm
    openmpi-vt-1.3.1-1.fc11.i586.rpm

from

 
http://mirrors.kernel.org/fedora/releases/11/Everything/i386/os/Packages/

4) Force the packages in:

  sudo rpm -ivh --force openmpi-1.3.1-1.fc11.i586.rpm
openmpi-libs-1.3.1-1.fc11.i586.rpm openmpi-devel-1.3.1-1.fc11.i586.rpm
openmpi-vt-1.3.1-1.fc11.i586.rpm

I haven't run the example yet, but this does seem to install openmpi on
Fedora 11.

-- Craig




Re: MPI for ns-3

by Mathieu Lacage :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

hi josh,

I will review your code more thoroughly later but, here are a few
comments below.

On Wed, 2009-11-04 at 10:48 -0500, jpelkey@... wrote:

> node.[h,cc] - Added a "SetSystemId" function, to allow the system id to
> be set after the node is constructed, rather than passing parameter
> to constructor; this exists because there was no way to use
> non-default constructors on object creation.  I believe this might have
> been fixed since then, so perhaps this might not be neded.

Yes: you can now remove SetSystemId and do this:

Ptr<Node> node = CreateObject<Node> (mySystemId);

> global-route-manager.cc - Do not compute routes for any node whose system-id
> does not match the simulator's system id.  Clearly, we don't need to
> compute routes in simulator A for nodes modeled in simulator B.

What happens if I want to send ip packets from node 0 in simulator A to
node 1 in simulator B ? How does ns-3 know the route from 0 to 1 ?

Mathieu


Re: MPI for ns-3

by Josh Pelkey :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Mathieu,

> What happens if I want to send ip packets from node 0 in simulator A to
> node 1 in simulator B ? How does ns-3 know the route from 0 to 1 ?

The full topology is built at each logical processor.  So the packet will go
from node 0 to node 1 as normal.  The only difference is when it reaches the
simulator boundary at simulator A, the packet is serialized and sent via MPI
to simulator B.  It is then received and deserialized into a packet and sent
on its way.  Since both simulators have a full view of the topology, they
can route packets normally.

--
Josh

On Mon, Nov 16, 2009 at 10:43 AM, Mathieu Lacage <
mathieu.lacage@...> wrote:

> hi josh,
>
> I will review your code more thoroughly later but, here are a few
> comments below.
>
> On Wed, 2009-11-04 at 10:48 -0500, jpelkey@... wrote:
>
> > node.[h,cc] - Added a "SetSystemId" function, to allow the system id to
> > be set after the node is constructed, rather than passing parameter
> > to constructor; this exists because there was no way to use
> > non-default constructors on object creation.  I believe this might have
> > been fixed since then, so perhaps this might not be neded.
>
> Yes: you can now remove SetSystemId and do this:
>
> Ptr<Node> node = CreateObject<Node> (mySystemId);
>
> > global-route-manager.cc - Do not compute routes for any node whose
> system-id
> > does not match the simulator's system id.  Clearly, we don't need to
> > compute routes in simulator A for nodes modeled in simulator B.
>
> What happens if I want to send ip packets from node 0 in simulator A to
> node 1 in simulator B ? How does ns-3 know the route from 0 to 1 ?
>
> Mathieu
>
>

Re: MPI for ns-3

by craigdo :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> > What happens if I want to send ip packets from node 0 in simulator A
> to
> > node 1 in simulator B ? How does ns-3 know the route from 0 to 1 ?
>
> The full topology is built at each logical processor.  So the packet
> will go
> from node 0 to node 1 as normal.  The only difference is when it
> reaches the
> simulator boundary at simulator A, the packet is serialized and sent
> via MPI
> to simulator B.  It is then received and deserialized into a packet and
> sent
> on its way.  Since both simulators have a full view of the topology,
> they
> can route packets normally.

This is one of the subjects that concerned me.  Right now, as you say, the
example code builds the entire topology and applications are either enabled
or not based on the LP number.  What happens in more complicated topologies
that include other devices than point-to-point?

Consider a dumbbell with wifi or other hybrid networks on both sides, the
idea being to split the processing of the left side and the right side of
the dumbbell onto two processors.  How do you create this thing?  What
guidance do we provide?  If you don't switch the wifi network creation on
LP, you have both sides running everything.  If you create the entire
topology on both sides and only enable applications by switching on LP, you
have, for example, device beacons, ARP, mobility models, tracing, etc.
running on both wireless networks on both processors.  It seems to me that
the optimal thing to do would be NOT to build the entire topology on both
LPs, but to switch the building of the left-side topology and the right-side
topology on LP as well as the applications.  In this case, global routing
cannot work.  Have we tested this?

I took mixed-wireless.cc and replaced the wireless backbone with a
point-to-point link and enabled mpi, only switching applications according
to LP.  I created nodes with the appropriate LP.  When I ran it, an assert
popped: check(m_current) error in buffer.h  I didn't debug this, so I don't
know exactly what is happening; but this is a surprisingly low-level thing
for such a simple port.  I am concerned about fragility.  I think we need
more examples like an mpi-mixed-wireless.

WRT documentation, what is the recommended plan for parallelizing something
like a mixed-wireless.cc with the backbone replaced with a point-to-point?
What is the basic model for running which bits on what processor?  How is
routing expected to work?  Should we switch topology creation on rank?
Should we run OLSR across ranks?  What are the options?  Tradeoffs?  How is
tracing expected to work?  You have got to switch tracing on LP number,
unless you are really running on multiple machines, right?  We should
document stuff like this, otherwise a simple change to machinefile could
crash the scripts.

I think there are a lot of unanswered questions here.  We really need some
more documentation (in fact, I don't think there's any Doxygen at all on the
additions) and some more examples.  We also need some tests.  I found no
tests anywhere.  We need something for the new channel, the new simulator
and a system test if mpi is present.  This would be much easier if the code
was integrated into the system better.

The build system changes seem to be fairly straightforward.  Adding Doxygen
seems to be straightforward.  Adding manual entries just requires time.  I
could maybe be convinced that these could be treated as P1 bugs with a
commitment to address them.  What do you think Faker -- you are the RM?

What really concerns me is fragility.  I would really like to see some more
involved examples ported to this environment.  mixed-wireless comes
immediately to mind.  So does tcp-star-server.cc, star.cc,
simple-point-to-point-olsr.cc, and dynamic-global-routing.cc.  This would
also serve to find the porting points that cause unnecessary pain (like the
create node with system id bit).  We need more tests.

-- Craig




Re: MPI for ns-3

by Mathieu Lacage :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Mon, 2009-11-16 at 11:48 -0500, Josh Pelkey wrote:

> > What happens if I want to send ip packets from node 0 in simulator A
> to
> > node 1 in simulator B ? How does ns-3 know the route from 0 to 1 ?
>
> The full topology is built at each logical processor.  So the packet
> will go from node 0 to node 1 as normal.  The only difference is when
> it reaches the simulator boundary at simulator A, the packet is
> serialized and sent via MPI to simulator B.  It is then received and
> deserialized into a packet and sent on its way.  Since both simulators
> have a full view of the topology, they can route packets normally.

I have not yet looked at the code but your original comment said:

global-route-manager.cc - Do not compute routes for any node whose
system-id does not match the simulator's system id.  Clearly, we don't
need to compute routes in simulator A for nodes modeled in simulator B.

So, what I am asking is: if you don't compute routes to reach the nodes
in simulator B from simulator A, how does the forwarding code know which
device to send a packet to to reach a node in simulator B from a node in
simulator A.

Mathieu