|
View:
New views
20 Messages
—
Rating Filter:
Alert me
|
|
|
multi-thread islandsHi All,
I know ODE is not thread safe by default, has anyone tried multi-threading the islands? Or has some idea on what needs to happen before ODE is made thread safe? Thanks, John --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "ode-users" group. To post to this group, send email to ode-users@... To unsubscribe from this group, send email to ode-users+unsubscribe@... For more options, visit this group at http://groups.google.com/group/ode-users?hl=en -~----------~----~----~----~------~----~------~--~--- |
|
|
Re: multi-thread islandsIn our lab we began with ODE and replaced all of the stages with re-entrant data-parallel versions. The integration stage is still ODEs, however, and we simply modified it to be re-entrant, among other things. This was a while ago, and I don't remember the specifics of everything we did, but a couple of guidelines: Of course, you want to eliminate access to global variables. Any global variables should be consolidated in some way, such as using parameters. Best to just look through and step through the code, making sure everything is local, *and non-static* (although I think ODE uses a lot of global variables and not local static ones). Be very thorough about accounting for each variable. Daniel On Sep 15, 6:14 pm, John Hsu <hsujohn...@...> wrote: > Hi All, > > I know ODE is not thread safe by default, has anyone tried multi-threading > the islands? Or has some idea on what needs to happen before ODE is made > thread safe? > > Thanks, > John --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "ode-users" group. To post to this group, send email to ode-users@... To unsubscribe from this group, send email to ode-users+unsubscribe@... For more options, visit this group at http://groups.google.com/group/ode-users?hl=en -~----------~----~----~----~------~----~------~--~--- |
|
|
Re: multi-thread islandsI have it in my plans to implement threading
interface for steppers sometimes in the future. So far, do it on your
own.
See dxProcessIslands()
Oleh Derevenko -- ICQ: 36361783
|
|
|
Re: multi-thread islandsI have plan to process very large simulations, have a documentation of dxProcessIslands()? and insland system? thanks. On 16 set, 15:38, Oleh Derevenko <Oleh.Dereve...@...> wrote: > I have it in my plans to implement threading interface for steppers sometimes in the future. So far, do it on your own. > See dxProcessIslands() > > Oleh Derevenko > -- ICQ: 36361783 > > ----- Original Message ----- > From: John Hsu > To: ode-users@... > Sent: Wednesday, September 16, 2009 4:14 AM > Subject: [ode-users] multi-thread islands > > Hi All, > > I know ODE is not thread safe by default, has anyone tried multi-threading the islands? Or has some idea on what needs to happen before ODE is made thread safe? > > Thanks, > John You received this message because you are subscribed to the Google Groups "ode-users" group. To post to this group, send email to ode-users@... To unsubscribe from this group, send email to ode-users+unsubscribe@... For more options, visit this group at http://groups.google.com/group/ode-users?hl=en -~----------~----~----~----~------~----~------~--~--- |
|
|
Re: multi-thread islandsIt's an internal function in ODE. It's not the part of public interface. You'll have to understand the code and integrate ODE in your project to bypass/change public steps and add your logic in dxProcessIslands(). Oleh Derevenko -- ICQ: 36361783 ----- Original Message ----- From: "Gabriel" <newbie.x11@...> To: "ode-users" <ode-users@...> Sent: Wednesday, September 16, 2009 10:22 PM Subject: [ode-users] Re: multi-thread islands I have plan to process very large simulations, have a documentation of dxProcessIslands()? and insland system? thanks. On 16 set, 15:38, Oleh Derevenko <Oleh.Dereve...@...> wrote: > I have it in my plans to implement threading interface for steppers > sometimes in the future. So far, do it on your own. > See dxProcessIslands() > > Oleh Derevenko > -- ICQ: 36361783 > > ----- Original Message ----- > From: John Hsu > To: ode-users@... > Sent: Wednesday, September 16, 2009 4:14 AM > Subject: [ode-users] multi-thread islands > > Hi All, > > I know ODE is not thread safe by default, has anyone tried multi-threading > the islands? Or has some idea on what needs to happen before ODE is made > thread safe? > > Thanks, > John --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "ode-users" group. To post to this group, send email to ode-users@... To unsubscribe from this group, send email to ode-users+unsubscribe@... For more options, visit this group at http://groups.google.com/group/ode-users?hl=en -~----------~----~----~----~------~----~------~--~--- |
|
|
Re: multi-thread islandsThanks for the tips. In my naive initial attempt, with TLS(--enable-ou) and using dWorldQuickStep, I simply offloaded following stepper calls in dxProcessIslands to a threadpool for individual island,
BEGIN_STATE_SAVE(context, stepperstate) { // now do something with body and joint lists stepper (context,world,bodystart,bcount,jointstart,jcount,stepsize); } END_STATE_SAVE(context, stepperstate); and calling dAllocateODEDataForThread(dAllocateMaskAll); for every thread. Simulation runs fine until bodies in my test world comes into contact. I am digging through the code to look for potential issues, but thought I ping the group to avoid duplicating previous work :) John On Wed, Sep 16, 2009 at 12:22 PM, Gabriel <newbie.x11@...> wrote:
--~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "ode-users" group. To post to this group, send email to ode-users@... To unsubscribe from this group, send email to ode-users+unsubscribe@... For more options, visit this group at http://groups.google.com/group/ode-users?hl=en -~----------~----~----~----~------~----~------~--~--- |
|
|
Re: multi-thread islands'context' contains memory (the stackreplacement)
used by stepper. You must have separate copy of context for every
thread.
Also, do not mix world stepping and collision
detection together. The stepping can be threaded per island (the easiest way)
and the collision can be threaded per space (the narrow phase
only!).
Oleh Derevenko -- ICQ: 36361783
|
|
|
Re: multi-thread islandsPerhaps use sources before my changes with stack
allocation removal in QuickStep - it'll be easier for you to thread
them.
Oleh Derevenko -- ICQ: 36361783
|
|
|
Re: multi-thread islands> Thanks for the tips. In my naive initial attempt, with TLS(--enable-ou) and > using dWorldQuickStep, I simply offloaded following stepper calls in > dxProcessIslands to a threadpool for individual island, I sort of did this using dWorldStep, there are two things I can think of that you should take care of if you haven't already. They might also apply to dWorldQuickStep: The stepping function finds the separate islands through a fairly simple connected-component algorithm. This uses a "tag" variable that is local to the "body" object. However, the constraint satisfaction algorithm also used the same variable. Fixing this just required creating a second "tag" variable so that the connected component calculation didn't get stomped while islands were being processed. The other thing is that every body that gets updated is automatically moved to the head of the linked list stored in the World (so that inactive bodies can sit at the end of the list and not take up processing time). Changing a linked list is clearly not threadsafe; so you need to fix that. I can't remember for sure, but there were probably a couple other issues. Incidentally, if you try to just spin each and every island into its own thread or onto a thread-pool queue, odds are good that you'll generally get worse performance than the single-threaded version on the same problem. The overhead of creating threads or synchronizing the work-queue (not to mention the other synchronization requirements such as protecting the linked lists) will destroy your performance if, for example, you have a large number of single-body islands. You'll do a lot better if you have some way of estimating the amount of work required (how many bodies/constraints in the island) and then partition the work to different threads in larger-grain, as-equal-as- possible chunks. jc --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "ode-users" group. To post to this group, send email to ode-users@... To unsubscribe from this group, send email to ode-users+unsubscribe@... For more options, visit this group at http://groups.google.com/group/ode-users?hl=en -~----------~----~----~----~------~----~------~--~--- |
|
|
Re: multi-thread islandsThanks Oleh,
I am only trying to thread the stepping part for now. I'll look into how to duplicate 'context' for each thread, do you have an example? John On Wed, Sep 16, 2009 at 12:56 PM, Oleh Derevenko <Oleh.Derevenko@...> wrote:
--~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "ode-users" group. To post to this group, send email to ode-users@... To unsubscribe from this group, send email to ode-users+unsubscribe@... For more options, visit this group at http://groups.google.com/group/ode-users?hl=en -~----------~----~----~----~------~----~------~--~--- |
|
|
Re: multi-thread islandsNo, I don't. Just allocate the same memory size as original context contains and set up the pointers for the structure in it. That will be fine for now. On 18 Вер, 00:24, John Hsu <hsujohn...@...> wrote: > Thanks Oleh, > I am only trying to thread the stepping part for now. I'll look into how to > duplicate 'context' for each thread, do you have an example? > John > > On Wed, Sep 16, 2009 at 12:56 PM, Oleh Derevenko > <Oleh.Dereve...@...>wrote: > > > > > 'context' contains memory (the stackreplacement) used by stepper. You > > must have separate copy of context for every thread. > > > Also, do not mix world stepping and collision detection together. The > > stepping can be threaded per island (the easiest way) and the collision can > > be threaded per space (the narrow phase only!). > > > Oleh Derevenko > > -- ICQ: 36361783 > > > ----- Original Message ----- > > *From:* John Hsu <hsujohn...@...> > > *To:* ode-users@... > > *Sent:* Wednesday, September 16, 2009 10:50 PM > > *Subject:* [ode-users] Re: multi-thread islands > > > Thanks for the tips. In my naive initial attempt, with TLS(--enable-ou) > > and using dWorldQuickStep, I simply offloaded following stepper calls in > > dxProcessIslands to a threadpool for individual island, > > > * BEGIN_STATE_SAVE(context, stepperstate) { > > // now do something with body and joint lists > > stepper (context,world,bodystart,bcount,jointstart,jcount,stepsize); > > } END_STATE_SAVE(context, stepperstate);* > > > and calling dAllocateODEDataForThread(dAllocateMaskAll); for every thread. > > > Simulation runs fine until bodies in my test world comes into contact. I > > am digging through the code to look for potential issues, but thought I ping > > the group to avoid duplicating previous work :) > > > John > > > On Wed, Sep 16, 2009 at 12:22 PM, Gabriel <newbie....@...> wrote: > > >> I have plan to process very large simulations, > > >> have a documentation of dxProcessIslands()? and insland system? > > >> thanks. > > >> On 16 set, 15:38, Oleh Derevenko <Oleh.Dereve...@...> wrote: > >> > I have it in my plans to implement threading interface for steppers > >> sometimes in the future. So far, do it on your own. > >> > See dxProcessIslands() > > >> > Oleh Derevenko > >> > -- ICQ: 36361783 > > >> > ----- Original Message ----- > >> > From: John Hsu > >> > To: ode-users@... > >> > Sent: Wednesday, September 16, 2009 4:14 AM > >> > Subject: [ode-users] multi-thread islands > > >> > Hi All, > > >> > I know ODE is not thread safe by default, has anyone tried > >> multi-threading the islands? Or has some idea on what needs to happen > >> before ODE is made thread safe? > > >> > Thanks, > >> > John- Сховати цитований текст - > > - Показати цитований текст - You received this message because you are subscribed to the Google Groups "ode-users" group. To post to this group, send email to ode-users@... To unsubscribe from this group, send email to ode-users+unsubscribe@... For more options, visit this group at http://groups.google.com/group/ode-users?hl=en -~----------~----~----~----~------~----~------~--~--- |
|
|
Re: multi-thread islandsHi jcooper,
Thank you for these very helpful tips. I think, you mentioned in previous discussions that you had multi-threaded ODE before the new stack replacement was implemented by Oleh. Do you by any chance still have some benchmark results? As for the overhead, I figured if simple island threading works, I can then implement load balancing to group the islands. Plus the problem we are solving might naturally have substantial bodies/workload per island. John On Thu, Sep 17, 2009 at 7:50 AM, jcooper <josephcooper@...> wrote:
--~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "ode-users" group. To post to this group, send email to ode-users@... To unsubscribe from this group, send email to ode-users+unsubscribe@... For more options, visit this group at http://groups.google.com/group/ode-users?hl=en -~----------~----~----~----~------~----~------~--~--- |
|
|
Re: multi-thread islandsOk, finally got ODE running island threads using threadpools. I can get pretty good speed up running multiple robots when they are not holding hands (e.g. http://ros.org/wiki/simulator_gazebo/Tutorials).
The patch still needs cleanup, I'll post it when it's a bit cleaner. Basically I created multiple contexts, one per island and one shared context for island arrays. Also, added locks for collision_space geom linked list operations. Thanks everyone for helping! John On Thu, Sep 17, 2009 at 3:14 PM, John Hsu <hsujohnhsu@...> wrote: Hi jcooper, --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "ode-users" group. To post to this group, send email to ode-users@... To unsubscribe from this group, send email to ode-users+unsubscribe@... For more options, visit this group at http://groups.google.com/group/ode-users?hl=en -~----------~----~----~----~------~----~------~--~--- |
|
|
Re: multi-thread islandsHi John, Intel did something similar with their Intel Threading Building Blocks. Do you know how your implementation compares? Nico On Wed, Sep 30, 2009 at 11:02 PM, John Hsu <hsujohnhsu@...> wrote: > Ok, finally got ODE running island threads using threadpools. I can get > pretty good speed up running multiple robots when they are not holding hands > (e.g. http://ros.org/wiki/simulator_gazebo/Tutorials). > The patch still needs cleanup, I'll post it when it's a bit cleaner. > Basically I created multiple contexts, one per island and one shared context > for island arrays. Also, added locks for collision_space geom linked list > operations. > Thanks everyone for helping! > John > > On Thu, Sep 17, 2009 at 3:14 PM, John Hsu <hsujohnhsu@...> wrote: >> >> Hi jcooper, >> Thank you for these very helpful tips. I think, you mentioned in previous >> discussions that you had multi-threaded ODE before the new stack replacement >> was implemented by Oleh. Do you by any chance still have some benchmark >> results? >> As for the overhead, I figured if simple island threading works, I can >> then implement load balancing to group the islands. Plus the problem we are >> solving might naturally have substantial bodies/workload per island. >> John >> >> On Thu, Sep 17, 2009 at 7:50 AM, jcooper <josephcooper@...> wrote: >>> >>> > Thanks for the tips. In my naive initial attempt, with >>> > TLS(--enable-ou) and >>> > using dWorldQuickStep, I simply offloaded following stepper calls in >>> > dxProcessIslands to a threadpool for individual island, >>> >>> I sort of did this using dWorldStep, there are two things I can think >>> of that you should take care of if you haven't already. They might >>> also apply to dWorldQuickStep: >>> >>> The stepping function finds the separate islands through a fairly >>> simple connected-component algorithm. This uses a "tag" variable that >>> is local to the "body" object. However, the constraint satisfaction >>> algorithm also used the same variable. Fixing this just required >>> creating a second "tag" variable so that the connected component >>> calculation didn't get stomped while islands were being processed. >>> >>> The other thing is that every body that gets updated is automatically >>> moved to the head of the linked list stored in the World (so that >>> inactive bodies can sit at the end of the list and not take up >>> processing time). Changing a linked list is clearly not threadsafe; >>> so you need to fix that. >>> >>> I can't remember for sure, but there were probably a couple other >>> issues. >>> >>> Incidentally, if you try to just spin each and every island into its >>> own thread or onto a thread-pool queue, odds are good that you'll >>> generally get worse performance than the single-threaded version on >>> the same problem. The overhead of creating threads or synchronizing >>> the work-queue (not to mention the other synchronization requirements >>> such as protecting the linked lists) will destroy your performance if, >>> for example, you have a large number of single-body islands. You'll >>> do a lot better if you have some way of estimating the amount of work >>> required (how many bodies/constraints in the island) and then >>> partition the work to different threads in larger-grain, as-equal-as- >>> possible chunks. >>> >>> jc >>> >>> >> > > > > > --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "ode-users" group. To post to this group, send email to ode-users@... To unsubscribe from this group, send email to ode-users+unsubscribe@... For more options, visit this group at http://groups.google.com/group/ode-users?hl=en -~----------~----~----~----~------~----~------~--~--- |
|
|
Re: multi-thread islandsHi Nico,
Looking at Intel's example, quoting from the book: "The first attempt gave a 1.1X speedup, whereas the second effort gave us a 1.29X speedup when run with 400 simple objects on a quad-core system, all in the span of less than a day of effort." is this what you were referring to? For our application we're simulating multiple robots, each having 31 DOFs, assuming island==robot if the robots are not in contact with non-static objects. Here's some very rough preliminary numbers with 4 threads on my desktop, quickstep with 10 inner interatiobns, ratio of elapsed sim-time to real-time: 1 robot: ~1.16X with MT and ~1.20X w/o MT 2 robots: ~.58X with MT and ~.54X w/o MT 3 robots: ~.33X with MT and ~.30X w/o MT Given that quickstep is quite efficient, I expect to see better speed up with more inner iterations as accuracy requirements are increased. Here's quickstep with 60 inner iterations, slightly better speedup as expected: 1 robot: ~.53X with MT and ~.55X w/o MT 2 robots: ~.32X with MT and ~.25X w/o MT 3 robots: ~.20X with MT and ~.15X w/o MT Disclaimer: the numbers here are not acquired in a very controlled environment. In this test case, collision detection, sensor generation (cameras, range scans, etc) and motor controllers are all taking up significant amounts of total simulation time. I'll post more results as more detailed study becomes available. If LCP is taking up most of the computational efforts (for example, stable box stacks w/o disabling bodies), the multi-threaded island speedup is closer to ideal speedup. On a separate topic, I am curious if anyone has any recent benchmark comparisons between different physics engines as well. Thanks, John On Thu, Oct 1, 2009 at 3:50 AM, Nico Kruithof <nicokruithof@...> wrote:
--~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "ode-users" group. To post to this group, send email to ode-users@... To unsubscribe from this group, send email to ode-users+unsubscribe@... For more options, visit this group at http://groups.google.com/group/ode-users?hl=en -~----------~----~----~----~------~----~------~--~--- |
|
|
Re: multi-thread islandsHi John,
Thanks for the timings. It looks really impressive. An improvement of 33% is good! I assume that intel had more islands than 3 when they had the 1.29x speedup. A bit more context on the intel threading building blocks as I realised that I was really coarse with the reference. It is a libraray that allows you to split a program into "tasks" which are then (automatically) spread over several threads. Since starting a task is less expensive than starting a thread, the overhead of the parallelism should decrease. In the book about the library they have an example where they parallelized ode. They started one instance of "find island" and after finding the first island, they start another thread to process it while finding the next island. Your quote comes from this example as well. They mention that the sample code is available for download as well, but I (google) couldn't find it. Bests, Nico On Sat, Oct 3, 2009 at 4:08 AM, John Hsu <hsujohnhsu@...> wrote: Hi Nico, --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "ode-users" group. To post to this group, send email to ode-users@... To unsubscribe from this group, send email to ode-users+unsubscribe@... For more options, visit this group at http://groups.google.com/group/ode-users?hl=en -~----------~----~----~----~------~----~------~--~--- |
|
|
Re: multi-thread islandsHi All,
Attached is my patch for multi-threading islands in ode, please feel free to test it and provide feedbacks/questions/comments. Thank you. John On Sun, Oct 4, 2009 at 11:51 PM, Nico Kruithof <nicokruithof@...> wrote: Hi John, --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "ode-users" group. To post to this group, send email to ode-users@... To unsubscribe from this group, send email to ode-users+unsubscribe@... For more options, visit this group at http://groups.google.com/group/ode-users?hl=en -~----------~----~----~----~------~----~------~--~--- Index: ode/src/collision_space.cpp =================================================================== --- ode/src/collision_space.cpp (revision 1706) +++ ode/src/collision_space.cpp (working copy) @@ -168,7 +168,10 @@ // add geom->parent_space = this; - geom->spaceAdd (&first); + { + boost::mutex::scoped_lock lock(this->mutex); + geom->spaceAdd (&first); // lock mutex before alterning linked list + } count++; // enumerator has been invalidated @@ -189,7 +192,10 @@ dUASSERT (geom->parent_space == this,"object is not in this space"); // remove - geom->spaceRemove(); + { + boost::mutex::scoped_lock lock(this->mutex); + geom->spaceRemove(); // lock mutex before alterning linked list + } count--; // safeguard @@ -208,6 +214,7 @@ void dxSpace::dirty (dxGeom *geom) { + boost::mutex::scoped_lock lock(this->mutex); // lock mutex before alterning linked list geom->spaceRemove(); geom->spaceAdd (&first); } Index: ode/src/quickstep.cpp =================================================================== --- ode/src/quickstep.cpp (revision 1706) +++ ode/src/quickstep.cpp (working copy) @@ -589,7 +592,7 @@ dxJoint::Info1 info; }; -void dxQuickStepper (dxWorldProcessContext *context, +void dxQuickStepper (dxWorldProcessContext *shared_context,dxWorldProcessContext *context, dxWorld *world, dxBody * const *body, int nb, dxJoint * const *_joint, int _nj, dReal stepsize) { Index: ode/src/util.h =================================================================== --- ode/src/util.h (revision 1706) +++ ode/src/util.h (working copy) @@ -187,8 +188,8 @@ void CleanupContext(); - void SavePreallocations(int islandcount, int const *islandsizes, dxBody *const *bodies, dxJoint *const *joints); - void RetrievePreallocations(int &islandcount, int const *&islandsizes, dxBody *const *&bodies, dxJoint *const *&joints); + void SavePreallocations(int islandcount, int const *islandsizes, dxBody *const *bodies, dxJoint *const *joints, size_t const *islandreqs); + void RetrievePreallocations(int &islandcount, int const *&islandsizes, dxBody *const *&bodies, dxJoint *const *&joints, size_t const *&islandreqs); void OffsetPreallocations(size_t stOffset); void CopyPreallocations(const dxWorldProcessContext *othercontext); void ClearPreallocations(); @@ -204,6 +205,7 @@ int m_IslandCount; int const *m_pIslandSizes; + size_t const *m_pIslandReqs; dxBody *const *m_pBodies; dxJoint *const *m_pJoints; @@ -216,7 +218,7 @@ #define BEGIN_STATE_SAVE(context, state) void *state = context->SaveState(); #define END_STATE_SAVE(context, state) context->RestoreState(state) -typedef void (*dstepper_fn_t) (dxWorldProcessContext *context, +typedef void (*dstepper_fn_t) (dxWorldProcessContext *context,dxWorldProcessContext *island_context, dxWorld *world, dxBody * const *body, int nb, dxJoint * const *_joint, int _nj, dReal stepsize); Index: ode/src/quickstep.h =================================================================== --- ode/src/quickstep.h (revision 1706) +++ ode/src/quickstep.h (working copy) @@ -28,7 +28,7 @@ size_t dxEstimateQuickStepMemoryRequirements ( dxBody * const *body, int nb, dxJoint * const *_joint, int _nj); -void dxQuickStepper (dxWorldProcessContext *context, +void dxQuickStepper (dxWorldProcessContext *shared_context,dxWorldProcessContext *context, dxWorld *world, dxBody * const *body, int nb, dxJoint * const *_joint, int _nj, dReal stepsize); Index: ode/src/util.cpp =================================================================== --- ode/src/util.cpp (revision 1706) +++ ode/src/util.cpp (working copy) @@ -24,6 +24,8 @@ #include "objects.h" #include "joints/joint.h" #include "util.h" +#include <boost/thread/recursive_mutex.hpp> +#include <boost/bind.hpp> static void InternalFreeWorldProcessContext (dxWorldProcessContext *context); @@ -44,17 +46,19 @@ FreePreallocationsContext(); } -void dxWorldProcessContext::SavePreallocations(int islandcount, int const *islandsizes, dxBody *const *bodies, dxJoint *const *joints) +void dxWorldProcessContext::SavePreallocations(int islandcount, int const *islandsizes, dxBody *const *bodies, dxJoint *const *joints, size_t const *islandreqs) { m_IslandCount = islandcount; + m_pIslandReqs = islandreqs; m_pIslandSizes = islandsizes; m_pBodies = bodies; m_pJoints = joints; } -void dxWorldProcessContext::RetrievePreallocations(int &islandcount, int const *&islandsizes, dxBody *const *&bodies, dxJoint *const *&joints) +void dxWorldProcessContext::RetrievePreallocations(int &islandcount, int const *&islandsizes, dxBody *const *&bodies, dxJoint *const *&joints, size_t const *&islandreqs) { islandcount = m_IslandCount; + islandreqs = m_pIslandReqs; islandsizes = m_pIslandSizes; bodies = m_pBodies; joints = m_pJoints; @@ -64,6 +68,7 @@ { // m_IslandCount = -- no offset for count m_pIslandSizes = m_pIslandSizes ? (int const *)((size_t)m_pIslandSizes + stOffset) : NULL; + m_pIslandReqs = m_pIslandReqs ? (size_t const *)((size_t)m_pIslandReqs + stOffset) : NULL; m_pBodies = m_pBodies ? (dxBody *const *)((size_t)m_pBodies + stOffset) : NULL; m_pJoints = m_pJoints ? (dxJoint *const *)((size_t)m_pJoints + stOffset) : NULL; } @@ -72,6 +77,7 @@ { m_IslandCount = othercontext->m_IslandCount; m_pIslandSizes = othercontext->m_pIslandSizes; + m_pIslandReqs = othercontext->m_pIslandReqs; m_pBodies = othercontext->m_pBodies; m_pJoints = othercontext->m_pJoints; } @@ -80,6 +86,7 @@ { m_IslandCount = 0; m_pIslandSizes = NULL; + m_pIslandReqs = NULL; m_pBodies = NULL; m_pJoints = NULL; } @@ -370,6 +377,8 @@ size_t islandcounts = dEFFICIENT_SIZE(world->nb * 2 * sizeof(int)); res += islandcounts; + size_t islandreqs = dEFFICIENT_SIZE(world->nb * sizeof(size_t)); + res += islandreqs; size_t bodiessize = dEFFICIENT_SIZE(world->nb * sizeof(dxBody*)); size_t jointssize = dEFFICIENT_SIZE(world->nj * sizeof(dxJoint*)); @@ -379,6 +388,11 @@ return res; } +// sorts out islands, +// cllocates array for island information into arrays: body[nj], joint[nb], islandsizes[2*nb] +// context->SavePreallocations(islandcount, islandsizes, body, joint,islandreqs); +// and put into context +// static size_t BuildIslandsAndEstimateStepperMemoryRequirements(dxWorldProcessContext *context, dxWorld *world, dReal stepsize, dmemestimate_fn_t stepperestimate) { @@ -392,55 +406,64 @@ // Make array for island body/joint counts int *islandsizes = context->AllocateArray<int>(2 * nb); int *sizescurr; + size_t *islandreqs = context->AllocateArray<size_t>(nb); + size_t *islandreqscurr; // make arrays for body and joint lists (for a single island) to go into - dxBody **body = context->AllocateArray<dxBody *>(nb); - dxJoint **joint = context->AllocateArray<dxJoint *>(nj); + dxBody **body = context->AllocateArray<dxBody *>(nb); // allocates a block of pointers and get back a pointer to first element + dxJoint **joint = context->AllocateArray<dxJoint *>(nj); // allocates a block of pointers and get back a pointer to first element BEGIN_STATE_SAVE(context, stackstate) { - // allocate a stack of unvisited bodies in the island. the maximum size of + // stack is used to hold untagged bodies when traversing through all the joint-linked bodies. + // at the end, all the bodies in the stack are popped back out into the island. + // + // allocate a stack of UNVISITED BODIES in the island. the maximum size of // the stack can be the lesser of the number of bodies or joints, because // new bodies are only ever added to the stack by going through untagged // joints. all the bodies in the stack must be tagged! int stackalloc = (nj < nb) ? nj : nb; - dxBody **stack = context->AllocateArray<dxBody *>(stackalloc); + dxBody **stack = context->AllocateArray<dxBody *>(stackalloc); // a body stack { - // set all body/joint tags to 0 - for (dxBody *b=world->firstbody; b; b=(dxBody*)b->next) b->tag = 0; - for (dxJoint *j=world->firstjoint; j; j=(dxJoint*)j->next) j->tag = 0; + // set all body/joint island_tags to 0 + for (dxBody *b=world->firstbody; b; b=(dxBody*)b->next) b->island_tag = 0; + for (dxJoint *j=world->firstjoint; j; j=(dxJoint*)j->next) j->island_tag = 0; } + int island_count = 0; sizescurr = islandsizes; + islandreqscurr = islandreqs; dxBody **bodystart = body; dxJoint **jointstart = joint; + // loop through all body, tag each one as it is processed + // every step in this for loop is one island for (dxBody *bb=world->firstbody; bb; bb=(dxBody*)bb->next) { // get bb = the next enabled, untagged body, and tag it - if (bb->tag || (bb->flags & dxBodyDisabled)) continue; - bb->tag = 1; + if (bb->island_tag || (bb->flags & dxBodyDisabled)) continue; + bb->island_tag = 1; dxBody **bodycurr = bodystart; dxJoint **jointcurr = jointstart; // tag all bodies and joints starting from bb. - *bodycurr++ = bb; + *bodycurr++ = bb; // adding to the context memory int stacksize = 0; dxBody *b = bb; while (true) { - // traverse and tag all body's joints, add untagged connected bodies + // traverse and island_tag all body's joints, add untagged connected bodies // to stack for (dxJointNode *n=b->firstjoint; n; n=n->next) { dxJoint *njoint = n->joint; - if (!njoint->tag && njoint->isEnabled()) { - njoint->tag = 1; + if (!njoint->island_tag && njoint->isEnabled()) { + njoint->island_tag = 1; *jointcurr++ = njoint; dxBody *nbody = n->body; // Body disabled flag is not checked here. This is how auto-enable works. - if (nbody && !nbody->tag) { - nbody->tag = 1; + if (nbody && !nbody->island_tag) { + nbody->island_tag = 1; // Make sure all bodies are in the enabled state. nbody->flags &= ~dxBodyDisabled; stack[stacksize++] = nbody; @@ -464,13 +487,16 @@ sizescurr[1] = jcount; sizescurr += sizeelements; - size_t islandreq = stepperestimate(bodystart, bcount, jointstart, jcount); - maxreq = (maxreq > islandreq) ? maxreq : islandreq; + *islandreqscurr = stepperestimate(bodystart, bcount, jointstart, jcount); + maxreq = (maxreq > *islandreqscurr) ? maxreq : *islandreqscurr; + //printf("island %d complete, stepper %d maxreq %d \n",island_count++,*islandreqscurr, maxreq); + islandreqscurr += 1; + // pointer to next free context bodystart = bodycurr; jointstart = jointcurr; } - } END_STATE_SAVE(context, stackstate); + } END_STATE_SAVE(context, stackstate); // restores contex pointer m_pAllocCurrent back to what it was before this block # ifndef dNODEBUG // if debugging, check that all objects (except for disabled bodies, @@ -479,10 +505,10 @@ { for (dxBody *b=world->firstbody; b; b=(dxBody*)b->next) { if (b->flags & dxBodyDisabled) { - if (b->tag) dDebug (0,"disabled body tagged"); + if (b->island_tag) dDebug (0,"disabled body tagged"); } else { - if (!b->tag) dDebug (0,"enabled body not tagged"); + if (!b->island_tag) dDebug (0,"enabled body not tagged"); } } for (dxJoint *j=world->firstjoint; j; j=(dxJoint*)j->next) { @@ -490,18 +516,24 @@ (j->node[1].body && (j->node[1].body->flags & dxBodyDisabled)==0) ) && j->isEnabled() ) { - if (!j->tag) dDebug (0,"attached enabled joint not tagged"); + if (!j->island_tag) dDebug (0,"attached enabled joint not tagged"); } else { - if (j->tag) dDebug (0,"unattached or disabled joint tagged"); + if (j->island_tag) dDebug (0,"unattached or disabled joint tagged"); } } } # endif int islandcount = (sizescurr - islandsizes) / sizeelements; - context->SavePreallocations(islandcount, islandsizes, body, joint); + context->SavePreallocations(islandcount, islandsizes, body, joint,islandreqs); + //printf("total island count: %d\n",islandcount); + for (int j=0; j<islandcount; j++) + { + //printf("island:%d bodycount:%d jointcount:%d islandreqs:%d \n",j,islandsizes[2*j],islandsizes[2*j+1],islandreqs[j]); + } + return maxreq; } @@ -516,6 +548,20 @@ // bodies will not be included in the simulation. disabled bodies are // re-enabled if they are found to be part of an active island. +void dxProcessOneIsland(dxWorldProcessContext *context,dxWorldProcessContext *island_context, dxWorld *world, dReal stepsize, dstepper_fn_t stepper, + dxBody *const* bodystart, + int bcount, + dxJoint *const *jointstart, + int jcount) +{ + dAllocateODEDataForThread(dAllocateMaskAll); + + BEGIN_STATE_SAVE(island_context, island_stepperstate) { + stepper (context,island_context,world,bodystart,bcount,jointstart,jcount,stepsize); + } END_STATE_SAVE(island_context, island_stepperstate); + dCleanupODEAllDataForThread(); +} + void dxProcessIslands (dxWorld *world, dReal stepsize, dstepper_fn_t stepper) { const int sizeelements = 2; @@ -526,28 +572,43 @@ dxWorldProcessContext *context = wmem->GetWorldProcessingContext(); int islandcount; + size_t const *islandreqs; int const *islandsizes; dxBody *const *body; dxJoint *const *joint; - context->RetrievePreallocations(islandcount, islandsizes, body, joint); + context->RetrievePreallocations(islandcount, islandsizes, body, joint, islandreqs); dxBody *const *bodystart = body; dxJoint *const *jointstart = joint; + int island_index = 0; int const *const sizesend = islandsizes + islandcount * sizeelements; for (int const *sizescurr = islandsizes; sizescurr != sizesend; sizescurr += sizeelements) { int bcount = sizescurr[0]; int jcount = sizescurr[1]; - BEGIN_STATE_SAVE(context, stepperstate) { - // now do something with body and joint lists - stepper (context,world,bodystart,bcount,jointstart,jcount,stepsize); - } END_STATE_SAVE(context, stepperstate); + //printf("debug: islandcount %d bcount %d jcount %d \n", islandcount,bcount, jcount); + dxStepWorkingMemory *island_wmem = world->island_wmems[island_index]; + island_index++; + dIASSERT(island_wmem != NULL); + dxWorldProcessContext *island_context = island_wmem->GetWorldProcessingContext(); + +#define TPOOLISLAND +#ifdef TPOOLISLAND + world->threadpool->schedule(boost::bind(dxProcessOneIsland,context,island_context, world, stepsize, stepper,bodystart, bcount, jointstart, jcount)); +#else + dxProcessOneIsland(context,island_context, world, stepsize, stepper,bodystart, bcount, jointstart, jcount); +#endif + bodystart += bcount; jointstart += jcount; } + world->threadpool->wait(); + for (int jj=0; jj < islandcount; jj++) + world->island_wmems[jj]->GetWorldProcessingContext()->CleanupContext(); + context->CleanupContext(); dIASSERT(context->IsStructureValid()); } @@ -703,7 +764,7 @@ bool dxReallocateWorldProcessContext (dxWorld *world, dReal stepsize, dmemestimate_fn_t stepperestimate) { - dxStepWorkingMemory *wmem = AllocateOnDemand(world->wmem); + dxStepWorkingMemory *wmem = AllocateOnDemand(world->wmem); // this is starting a new instance of dxStepWorkingMemory if (!wmem) return false; dxWorldProcessContext *oldcontext = wmem->GetWorldProcessingContext(); @@ -714,6 +775,10 @@ dxWorldProcessContext *context = oldcontext; + // EstimateIslandsProcessingMemoryRequirements allocates memeory for 3 arrays: + // islandsizes: integer arrays, 2*n_islands in size, contains bodycount and jointcount for each island + // body: one array with all the 'active' bodies, all indexed by islandsizes + // joint: one array with all the 'active' joints, all indexed by islandsizes size_t sesize; size_t islandsreq = EstimateIslandsProcessingMemoryRequirements(world, sesize); dIASSERT(islandsreq == dEFFICIENT_SIZE(islandsreq)); @@ -722,16 +787,45 @@ size_t stepperestimatereq = islandsreq + sesize; context = InternalReallocateWorldProcessContext(context, stepperestimatereq, memmgr, 1.0f, reserveinfo->m_uiReserveMinimum); + // + // above context allocation of the island arrays is successful, then we proceed to allocate more spaces for the actual stepping work + // + // we want to start multiple contexts, one for each island. + // if (context) { size_t stepperreq = BuildIslandsAndEstimateStepperMemoryRequirements(context, world, stepsize, stepperestimate); dIASSERT(stepperreq == dEFFICIENT_SIZE(stepperreq)); - size_t memreq = stepperreq + islandsreq; - context = InternalReallocateWorldProcessContext(context, memreq, memmgr, reserveinfo->m_fReserveFactor, reserveinfo->m_uiReserveMinimum); + // retrieve results of BuildIslandsAndEstimateStepperMemoryRequirements + int islandcount; + size_t const *islandreqs; + int const *islandsizes; + dxBody *const *body; + dxJoint *const *joint; + context->RetrievePreallocations(islandcount, islandsizes, body, joint, islandreqs); + + for (int jj = 0; jj < islandcount; jj++) + { + // for individual islands + dxStepWorkingMemory *island_wmem = AllocateOnDemand(world->island_wmems[jj]); // this is starting a new instance of dxStepWorkingMemory + if (!island_wmem) return false; + + dxWorldProcessContext *island_oldcontext = island_wmem->GetWorldProcessingContext(); + dIASSERT (!island_oldcontext || island_oldcontext->IsStructureValid()); + + const dxWorldProcessMemoryReserveInfo *island_reserveinfo = island_wmem->SureGetMemoryReserveInfo(); + const dxWorldProcessMemoryManager *island_memmgr = island_wmem->SureGetMemoryManager(); + + dxWorldProcessContext *island_context = island_oldcontext; + + size_t island_memreq = islandreqs[jj]; // + islandsreq; + island_context = InternalReallocateWorldProcessContext(island_context, island_memreq, island_memmgr, island_reserveinfo->m_fReserveFactor, island_reserveinfo->m_uiReserveMinimum); + island_wmem->SetWorldProcessingContext(island_context); // set dxStepWorkingMemory to context + } } - wmem->SetWorldProcessingContext(context); + wmem->SetWorldProcessingContext(context); // set dxStepWorkingMemory to context return context != NULL; } Index: ode/src/step.h =================================================================== --- ode/src/step.h (revision 1706) +++ ode/src/step.h (working copy) @@ -28,7 +28,7 @@ size_t dxEstimateStepMemoryRequirements ( dxBody * const *body, int nb, dxJoint * const *_joint, int _nj); -void dInternalStepIsland (dxWorldProcessContext *context, dxWorld *world, +void dInternalStepIsland (dxWorldProcessContext *shared_context,dxWorldProcessContext *context, dxWorld *world, dxBody * const *body, int nb, dxJoint * const *joint, int nj, dReal stepsize); Index: ode/src/objects.h =================================================================== --- ode/src/objects.h (revision 1706) +++ ode/src/objects.h (working copy) @@ -30,6 +30,7 @@ #include <ode/memory.h> #include <ode/mass.h> #include "array.h" +#include <boost/threadpool.hpp> class dxStepWorkingMemory; @@ -66,6 +67,7 @@ dObject *next; // next object of this type in list dObject **tome; // pointer to previous object's next ptr int tag; // used by dynamics algorithms + int island_tag; // used by island algorithms for grouping void *userdata; // user settable data dObject(dxWorld *w); virtual ~dObject() { } @@ -151,11 +153,13 @@ dxAutoDisable adis; // auto-disable parameters int body_flags; // flags for new bodies dxStepWorkingMemory *wmem; // Working memory object for dWorldStep/dWorldQuickStep + dxStepWorkingMemory *island_wmems[1000]; // Working memory object for dWorldStep/dWorldQuickStep dxQuickStepParameters qs; dxContactParameters contactp; dxDampingParameters dampingp; // damping parameters dReal max_angular_speed; // limit the angular velocity to this magnitude + boost::threadpool::pool *threadpool; }; Index: ode/src/collision_kernel.h =================================================================== --- ode/src/collision_kernel.h (revision 1706) +++ ode/src/collision_kernel.h (working copy) @@ -35,6 +35,7 @@ #include "config.h" #include "objects.h" #include "odetls.h" +#include <boost/thread/mutex.hpp> //**************************************************************************** // constants and macros @@ -196,6 +204,8 @@ int sublevel; // space sublevel (used in dSpaceCollide2). NOT TRACKED AUTOMATICALLY!!! unsigned tls_kind; // space TLS kind to be used for global caches retrieval + boost::mutex mutex; + // cached state for getGeom() int current_index; // only valid if current_geom != 0 dxGeom *current_geom; // if 0 then there is no information Index: ode/src/ode.cpp =================================================================== --- ode/src/ode.cpp (revision 1706) +++ ode/src/ode.cpp (working copy) @@ -1558,6 +1558,10 @@ w->body_flags = 0; // everything disabled w->wmem = 0; + for (int jj=0; jj < 1000; jj++) + { + w->island_wmems[jj] = 0; + } w->adis.idle_steps = 10; w->adis.idle_time = 0; @@ -1577,6 +1581,8 @@ w->dampingp.angular_threshold = REAL(0.01) * REAL(0.01); w->max_angular_speed = dInfinity; + w->threadpool = new boost::threadpool::pool(4); + return w; } Index: ode/src/step.cpp =================================================================== --- ode/src/step.cpp (revision 1706) +++ ode/src/step.cpp (working copy) @@ -839,7 +839,8 @@ //**************************************************************************** -void dInternalStepIsland (dxWorldProcessContext *context, +void dInternalStepIsland (dxWorldProcessContext *shared_context, + dxWorldProcessContext *context, dxWorld *world, dxBody * const *body, int nb, dxJoint * const *joint, int nj, dReal stepsize) { |
|
|
Re: multi-thread islandsExternal libraries (boost, stl, etc) are not
allowed.
Oleh Derevenko -- Skype with underscore
|
|
|
Re: multi-thread islandslpthread is efficient way? On 29 out, 16:15, Oleh Derevenko <Oleh.Dereve...@...> wrote: > External libraries (boost, stl, etc) are not allowed. > > Oleh Derevenko > -- Skype with underscore > > ----- Original Message ----- > From: John Hsu > To: ode-users@... > Sent: Thursday, October 29, 2009 7:54 PM > Subject: [ode-users] Re: multi-thread islands > > Hi All, > Attached is my patch for multi-threading islands in ode, please feel free to test it and provide feedbacks/questions/comments. Thank you. > John > > On Sun, Oct 4, 2009 at 11:51 PM, Nico Kruithof <nicokruit...@...> wrote: > > Hi John, > > Thanks for the timings. It looks really impressive. An improvement of 33% is good! I assume that intel had more islands than 3 when they had the 1.29x speedup. > > A bit more context on the intel threading building blocks as I realised that I was really coarse with the reference. It is a libraray that allows you to split a program into "tasks" which are then (automatically) spread over several threads. Since starting a task is less expensive than starting a thread, the overhead of the parallelism should decrease. In the book about the library they have an example where they parallelized ode. They started one instance of "find island" and after finding the first island, they start another thread to process it while finding the next island. Your quote comes from this example as well. They mention that the sample code is available for download as well, but I (google) couldn't find it. > > Bests, > Nico > > On Sat, Oct 3, 2009 at 4:08 AM, John Hsu <hsujohn...@...> wrote: > > Hi Nico, > > Looking at Intel's example, quoting from the book: > "The first attempt gave a 1.1X speedup, whereas the second effort gave us a 1.29X speedup when run with 400 simple objects on a quad-core system, all in the span of less than a day of effort." > is this what you were referring to? > > For our application we're simulating multiple robots, each having 31 DOFs, assuming island==robot if the robots are not in contact with non-static objects. Here's some very rough preliminary numbers with 4 threads on my desktop, > > quickstep with 10 inner interatiobns, ratio of elapsed sim-time to real-time: > 1 robot: ~1.16X with MT and ~1.20X w/o MT > 2 robots: ~.58X with MT and ~.54X w/o MT > 3 robots: ~.33X with MT and ~.30X w/o MT > > Given that quickstep is quite efficient, I expect to see better speed up with more inner iterations as accuracy requirements are increased. Here's quickstep with 60 inner iterations, slightly better speedup as expected: > 1 robot: ~.53X with MT and ~.55X w/o MT > 2 robots: ~.32X with MT and ~.25X w/o MT > 3 robots: ~.20X with MT and ~.15X w/o MT > > Disclaimer: the numbers here are not acquired in a very controlled environment. In this test case, collision detection, sensor generation (cameras, range scans, etc) and motor controllers are all taking up significant amounts of total simulation time. I'll post more results as more detailed study becomes available. > > If LCP is taking up most of the computational efforts (for example, stable box stacks w/o disabling bodies), the multi-threaded island speedup is closer to ideal speedup. > > On a separate topic, I am curious if anyone has any recent benchmark comparisons between different physics engines as well. > > Thanks, > John > > On Thu, Oct 1, 2009 at 3:50 AM, Nico Kruithof <nicokruit...@...> wrote: > > Hi John, > > Intel did something similar with their Intel Threading Building > Blocks. Do you know how your implementation compares? > > Nico > > On Wed, Sep 30, 2009 at 11:02 PM, John Hsu <hsujohn...@...> wrote: > > Ok, finally got ODE running island threads using threadpools. I can get > > pretty good speed up running multiple robots when they are not holding hands > > (e.g.http://ros.org/wiki/simulator_gazebo/Tutorials). > > The patch still needs cleanup, I'll post it when it's a bit cleaner. > > Basically I created multiple contexts, one per island and one shared context > > for island arrays. Also, added locks for collision_space geom linked list > > operations. > > Thanks everyone for helping! > > John > > > On Thu, Sep 17, 2009 at 3:14 PM, John Hsu <hsujohn...@...> wrote: > > >> Hi jcooper, > >> Thank you for these very helpful tips. I think, you mentioned in previous > >> discussions that you had multi-threaded ODE before the new stack replacement > >> was implemented by Oleh. Do you by any chance still have some benchmark > >> results? > >> As for the overhead, I figured if simple island threading works, I can > >> then implement load balancing to group the islands. Plus the problem we are > >> solving might naturally have substantial bodies/workload per island. > >> John > > >> On Thu, Sep 17, 2009 at 7:50 AM, jcooper <josephcoo...@...> wrote: > > >>> > Thanks for the tips. In my naive initial attempt, with > >>> > TLS(--enable-ou) and > >>> > using dWorldQuickStep, I simply offloaded following stepper calls in > >>> > dxProcessIslands to a threadpool for individual island, > > >>> I sort of did this using dWorldStep, there are two things I can think > >>> of that you should take care of if you haven't already. They might > >>> also apply to dWorldQuickStep: > > >>> The stepping function finds the separate islands through a fairly > >>> simple connected-component algorithm. This uses a "tag" variable that > >>> is local to the "body" object. However, the constraint satisfaction > >>> algorithm also used the same variable. Fixing this just required > >>> creating a second "tag" variable so that the connected component > >>> calculation didn't get stomped while islands were being processed. > > >>> The other thing is that every body that gets updated is automatically > >>> moved to the head of the linked list stored in the World (so that > >>> inactive bodies can sit at the end of the list and not take up > >>> processing time). Changing a linked list is clearly not threadsafe; > >>> so you need to fix that. > > >>> I can't remember for sure, but there were probably a couple other > >>> issues. > > >>> Incidentally, if you try to just spin each and every island into its > >>> own thread or onto a thread-pool queue, odds are good that you'll > >>> generally get worse performance than the single-threaded version on > >>> the same problem. The overhead of creating threads or synchronizing > >>> the work-queue (not to mention the other synchronization requirements > >>> such as protecting the linked lists) will destroy your performance if, > >>> for example, you have a large number of single-body islands. You'll > >>> do a lot better if you have some way of estimating the amount of work > >>> required (how many bodies/constraints in the island) and then > >>> partition the work to different threads in larger-grain, as-equal-as- > >>> possible chunks. > > >>> jc You received this message because you are subscribed to the Google Groups "ode-users" group. To post to this group, send email to ode-users@... To unsubscribe from this group, send email to ode-users+unsubscribe@... For more options, visit this group at http://groups.google.com/group/ode-users?hl=en -~----------~----~----~----~------~----~------~--~--- |
|
|
Re: multi-thread islandsMy Visual Studio does not contain anything named lpthread. Oleh Derevenko -- Skype with underscore ----- Original Message ----- From: "newbie_x11" <newbie.x11@...> To: "ode-users" <ode-users@...> Sent: Thursday, October 29, 2009 8:52 PM Subject: [ode-users] Re: multi-thread islands lpthread is efficient way? On 29 out, 16:15, Oleh Derevenko <Oleh.Dereve...@...> wrote: > External libraries (boost, stl, etc) are not allowed. > > Oleh Derevenko > -- Skype with underscore > > ----- Original Message ----- > From: John Hsu > To: ode-users@... > Sent: Thursday, October 29, 2009 7:54 PM > Subject: [ode-users] Re: multi-thread islands > > Hi All, > Attached is my patch for multi-threading islands in ode, please feel > free to test it and provide feedbacks/questions/comments. Thank you. > John > > On Sun, Oct 4, 2009 at 11:51 PM, Nico Kruithof <nicokruit...@...> > wrote: > > Hi John, > > Thanks for the timings. It looks really impressive. An improvement of > 33% is good! I assume that intel had more islands than 3 when they had the > 1.29x speedup. > > A bit more context on the intel threading building blocks as I > realised that I was really coarse with the reference. It is a libraray > that allows you to split a program into "tasks" which are then > (automatically) spread over several threads. Since starting a task is less > expensive than starting a thread, the overhead of the parallelism should > decrease. In the book about the library they have an example where they > parallelized ode. They started one instance of "find island" and after > finding the first island, they start another thread to process it while > finding the next island. Your quote comes from this example as well. They > mention that the sample code is available for download as well, but I > (google) couldn't find it. > > Bests, > Nico > > On Sat, Oct 3, 2009 at 4:08 AM, John Hsu <hsujohn...@...> wrote: > > Hi Nico, > > Looking at Intel's example, quoting from the book: > "The first attempt gave a 1.1X speedup, whereas the second effort > gave us a 1.29X speedup when run with 400 simple objects on a quad-core > system, all in the span of less than a day of effort." > is this what you were referring to? > > For our application we're simulating multiple robots, each having 31 > DOFs, assuming island==robot if the robots are not in contact with > non-static objects. Here's some very rough preliminary numbers with 4 > threads on my desktop, > > quickstep with 10 inner interatiobns, ratio of elapsed sim-time to > real-time: > 1 robot: ~1.16X with MT and ~1.20X w/o MT > 2 robots: ~.58X with MT and ~.54X w/o MT > 3 robots: ~.33X with MT and ~.30X w/o MT > > Given that quickstep is quite efficient, I expect to see better > speed up with more inner iterations as accuracy requirements are > increased. Here's quickstep with 60 inner iterations, slightly better > speedup as expected: > 1 robot: ~.53X with MT and ~.55X w/o MT > 2 robots: ~.32X with MT and ~.25X w/o MT > 3 robots: ~.20X with MT and ~.15X w/o MT > > Disclaimer: the numbers here are not acquired in a very controlled > environment. In this test case, collision detection, sensor generation > (cameras, range scans, etc) and motor controllers are all taking up > significant amounts of total simulation time. I'll post more results as > more detailed study becomes available. > > If LCP is taking up most of the computational efforts (for example, > stable box stacks w/o disabling bodies), the multi-threaded island speedup > is closer to ideal speedup. > > On a separate topic, I am curious if anyone has any recent benchmark > comparisons between different physics engines as well. > > Thanks, > John > > On Thu, Oct 1, 2009 at 3:50 AM, Nico Kruithof > <nicokruit...@...> wrote: > > Hi John, > > Intel did something similar with their Intel Threading Building > Blocks. Do you know how your implementation compares? > > Nico > > On Wed, Sep 30, 2009 at 11:02 PM, John Hsu <hsujohn...@...> > wrote: > > Ok, finally got ODE running island threads using threadpools. I > can get > > pretty good speed up running multiple robots when they are not > holding hands > > (e.g.http://ros.org/wiki/simulator_gazebo/Tutorials). > > The patch still needs cleanup, I'll post it when it's a bit > cleaner. > > Basically I created multiple contexts, one per island and one > shared context > > for island arrays. Also, added locks for collision_space geom > linked list > > operations. > > Thanks everyone for helping! > > John > > > On Thu, Sep 17, 2009 at 3:14 PM, John Hsu <hsujohn...@...> > wrote: > > >> Hi jcooper, > >> Thank you for these very helpful tips. I think, you mentioned > in previous > >> discussions that you had multi-threaded ODE before the new > stack replacement > >> was implemented by Oleh. Do you by any chance still have some > benchmark > >> results? > >> As for the overhead, I figured if simple island threading > works, I can > >> then implement load balancing to group the islands. Plus the > problem we are > >> solving might naturally have substantial bodies/workload per > island. > >> John > > >> On Thu, Sep 17, 2009 at 7:50 AM, jcooper > <josephcoo...@...> wrote: > > >>> > Thanks for the tips. In my naive initial attempt, with > >>> > TLS(--enable-ou) and > >>> > using dWorldQuickStep, I simply offloaded following stepper > calls in > >>> > dxProcessIslands to a threadpool for individual island, > > >>> I sort of did this using dWorldStep, there are two things I > can think > >>> of that you should take care of if you haven't already. They > might > >>> also apply to dWorldQuickStep: > > >>> The stepping function finds the separate islands through a > fairly > >>> simple connected-component algorithm. This uses a "tag" > variable that > >>> is local to the "body" object. However, the constraint > satisfaction > >>> algorithm also used the same variable. Fixing this just > required > >>> creating a second "tag" variable so that the connected > component > >>> calculation didn't get stomped while islands were being > processed. > > >>> The other thing is that every body that gets updated is > automatically > >>> moved to the head of the linked list stored in the World (so > that > >>> inactive bodies can sit at the end of the list and not take up > >>> processing time). Changing a linked list is clearly not > threadsafe; > >>> so you need to fix that. > > >>> I can't remember for sure, but there were probably a couple > other > >>> issues. > > >>> Incidentally, if you try to just spin each and every island > into its > >>> own thread or onto a thread-pool queue, odds are good that > you'll > >>> generally get worse performance than the single-threaded > version on > >>> the same problem. The overhead of creating threads or > synchronizing > >>> the work-queue (not to mention the other synchronization > requirements > >>> such as protecting the linked lists) will destroy your > performance if, > >>> for example, you have a large number of single-body islands. > You'll > >>> do a lot better if you have some way of estimating the amount > of work > >>> required (how many bodies/constraints in the island) and then > >>> partition the work to different threads in larger-grain, > as-equal-as- > >>> possible chunks. > > >>> jc --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "ode-users" group. To post to this group, send email to ode-users@... To unsubscribe from this group, send email to ode-users+unsubscribe@... For more options, visit this group at http://groups.google.com/group/ode-users?hl=en -~----------~----~----~----~------~----~------~--~--- |
| Free embeddable forum powered by Nabble | Forum Help |