|
View:
New views
15 Messages
—
Rating Filter:
Alert me
|
|
|
Question about bug 292049I'm revising the related source codes as required. But there is one problem when I'm trying to create SLURMJobAttributes, instead of modifying JobAttributes. When launching a job, the createjob() method in AbstractToolRuntimeSystem.java needs to copy JobAttributes from "attrMgr" to "jobAttrMgr". For SLURM rms job launch, job attributes must inlcude "numofnodes" and "timeLimit" attributes. If these job attributes are moved to SLURMJobAttributes.java, some build errors will occur with the createjob() method: ... Integer jobNumNodes = attrMgr.getAttribute(SLURMJobAttributes.getJobNumberOfNodesAttributeDefinition()).getValue()); Integer jobTimeLimit = attrMgr.getAttribute(SLURMJobAttributes.getJobTimeLimitAttributeDefinition()).getValue()); .... jobAttrMgr.addAttribute(SLURMJobAttributes.getJobNumberOfNodesAttributesDefinition().create(jobNumNodes)); jobAttrMgr.addAttribute(SLURMJobAttributes.getJobTimeLimitAttributesDefinition().create(jobTimeLimit)); ... However, when I tried to import org.eclipse.ptp.rm.slurm.core package(which includes class SLURMJobAttributes) in AbstractToolRuntimeSystem.java, it says that "The import org.eclipse.ptp.rm.slurm cannot be resolved". On the other hand, the "attrMgr" (which is initializied using launch configuration) is ONLY processed in AbstractToolRuntimeSystem.java, so it is not possible to add more attributes by the doCreateJob() method in SLURMResourceManager.java. How to add SLURM specific job attributes when launching job if implementing SLURMJobAttributes.java, other than changing JobAttributes.java? Regards, Jie Messenger保护盾2.0,更安全可靠的Messenger聊天! 现在就下载! _______________________________________________ ptp-dev mailing list ptp-dev@... https://dev.eclipse.org/mailman/listinfo/ptp-dev |
|
|
Re: Question about bug 292049Jie,
You shouldn't change anything in org.eclipse.ptp.rm.core/ui. In fact, you shouldn't need to use AbstractToolRuntimeSystem at all. This is only used for resource managers like Open MPI and MPICH2. All the SLURM-specific code must reside in the slurm.core and slurm.ui plugins. Add the code that I put in the bug to the existing SLURMResourceManager class in doCreateRuntimeSystem(). The SLURM attributes should be created in your SLURMRMLaunchConfigurationDynamicTab and returned via the getAttributes() method. These will be automatically passed to the submitJob method, and you will receive them in your SLURM proxy. Greg On Nov 2, 2009, at 7:34 AM, JiangJie wrote:
_______________________________________________ ptp-dev mailing list ptp-dev@... https://dev.eclipse.org/mailman/listinfo/ptp-dev |
|
|
RE: Question about bug 292049Thanks for your suggestion. By now, most of the codes related to SLURM support are kept in org.eclipse.ptp.rm.slurm.core/ui, except some minor changes to org.eclipse.ptp.ui.views.ParallelJobsView and org.eclipse.ptp.ui.views.ParallelProcessView. Since current implementation of ptp_slurm_proxy can't provide process Id to ptp ui when launching jobs, we have to display "N/A" message to avoid giving user wrong PID information. This problem will be solved in our next version of ptp_slurm_proxy. But for proxy_attr.h, things seem to be complex. We know that node states are defined in proxy_attr.h, such as NODE_STATE_UP/DOWN/ERROR/UNKNOWN. However, the SLURM header file "slurm.h" also defines its own node states as NODE_STATE_UNKNOWN/DOWN/IDLE/ALLOCATED. In my ptp_slurm_proxy, I have to include both proxy_attr.h and slurm.h. Obviously, NODE_STATE_DOWN/UNKNOWN are defined in both header files, which causes a compile error. I cann't change the implementation code of slurm.h, since other parts of SLURM rms and other SLURM users may rely on this file. So I choose to modify proxy_attr.h by adding #ifndef HAVE_SLURM_SLURN_H/#endif to comment the conflict definitions when building ptp_slurm_proxy. This may not be elegant, but it seems necessary. Regards, Jie Subject: Re: Question about bug 292049 From: g.watson@... Date: Mon, 2 Nov 2009 11:01:07 -0500 CC: ptp-dev@... To: yangtzj@... Jie, You shouldn't change anything in org.eclipse.ptp.rm.core/ui. In fact, you shouldn't need to use AbstractToolRuntimeSystem at all. This is only used for resource managers like Open MPI and MPICH2. All the SLURM-specific code must reside in the slurm.core and slurm.ui plugins. Add the code that I put in the bug to the existing SLURMResourceManager class in doCreateRuntimeSystem(). The SLURM attributes should be created in your SLURMRMLaunchConfigurationDynamicTab and returned via the getAttributes() method. These will be automatically passed to the submitJob method, and you will receive them in your SLURM proxy. Greg 使用Messenger保护盾2.0,支持多账号登录! 现在就下载! _______________________________________________ ptp-dev mailing list ptp-dev@... https://dev.eclipse.org/mailman/listinfo/ptp-dev |
|
|
Re: Question about bug 292049Hi Jie,
The changes to the views look fine. To fix the slurm.h problem, I've modified the proxy code to add "PTP_" to the beginning of all proxy*.h constants. Please update the slurm C code to use the new names and hopefully this should resolve the problem. Regards, Greg On Nov 3, 2009, at 8:19 AM, JiangJie wrote:
_______________________________________________ ptp-dev mailing list ptp-dev@... https://dev.eclipse.org/mailman/listinfo/ptp-dev |
|
|
RE: Question about bug 292049I'm almost done with the new patch. But during the test process, I found a problem that has been solved before. In SDMDebugger.java, writeRoutingFile() method has been moved outside the following "if (fSdmRunner !== null)" condition, which will eliminate the use of SLURMServiceProvider.needsDebuggerLaunchHelp(). Even if needsDebuggerLaunchHelp() returns false, the PTP debugger will still try to write the routing file. As we have discussed, SLURM proxy cann't provide enough information for PTP debugger to generate routing file.Instead, it writes the routing file on its own. So is it possbile to move the call to writeRoutingFile() inside the "if" condition? (There is a version of PTP where the call to writeRoutingFile() IS inside the "if" condition in my cvs update. When did this change happen?) Regards, Jie Subject: Re: Question about bug 292049 From: g.watson@... Date: Tue, 3 Nov 2009 10:04:59 -0500 CC: ptp-dev@... To: yangtzj@... Hi Jie, The changes to the views look fine. To fix the slurm.h problem, I've modified the proxy code to add "PTP_" to the beginning of all proxy*.h constants. Please update the slurm C code to use the new names and hopefully this should resolve the problem. Regards, Greg 使用Messenger保护盾2.0,支持多账号登录! 现在就下载! _______________________________________________ ptp-dev mailing list ptp-dev@... https://dev.eclipse.org/mailman/listinfo/ptp-dev |
|
|
Re: Question about bug 292049Jie,
Yes, this should really be inside the 'if', but it was moved because the PE RM does not currently generate a routing file. Dave, would it be possible to add this to the PE RM? Would it help if I provided some support functions in the utils package? Greg On Nov 5, 2009, at 9:37 AM, JiangJie wrote:
_______________________________________________ ptp-dev mailing list ptp-dev@... https://dev.eclipse.org/mailman/listinfo/ptp-dev |
|
|
Re: Re: Question about bug 292049Greg
I have code in the proxy already, ifdefed out for now, that is supposed to generate the routing file. The code generates the routing file after the attach.cfg file is read. The code I have now writes one line per task with task index, hostname and the string '7777' (I don't remember what 7777 is for). This is only a few lines of code so I should be able to make the change fairly quickly. The questions I have are what directory do I need to create this in, and how is that directory name passed to the proxy? Currently I think my code is picking it up from the PTP_JOB_WORKING_DIR_ATTR passed in the target program invocation request but I'm not sure if that's the right value or if I can count on that always being passed. Dave From: Greg Watson <g.watson@...> To: JiangJie <yangtzj@...> Cc: ptp-dev@... Date: 11/05/2009 09:51 AM Subject: [ptp-dev] Re: Question about bug 292049 Sent by: ptp-dev-bounces@... Jie, Yes, this should really be inside the 'if', but it was moved because the PE RM does not currently generate a routing file. Dave, would it be possible to add this to the PE RM? Would it help if I provided some support functions in the utils package? Greg On Nov 5, 2009, at 9:37 AM, JiangJie wrote: Hi Greg, I'm almost done with the new patch. But during the test process, I found a problem that has been solved before. In SDMDebugger.java, writeRoutingFile() method has been moved outside the following "if (fSdmRunner !== null)" condition, which will eliminate the use of SLURMServiceProvider.needsDebuggerLaunchHelp(). Even if needsDebuggerLaunchHelp() returns false, the PTP debugger will still try to write the routing file. As we have discussed, SLURM proxy cann't provide enough information for PTP debugger to generate routing file.Instead, it writes the routing file on its own. So is it possbile to move the call to writeRoutingFile() inside the "if" condition? (There is a version of PTP where the call to writeRoutingFile() IS inside the "if" condition in my cvs update. When did this change happen?) Regards, Jie Subject: Re: Question about bug 292049 From: g.watson@... Date: Tue, 3 N! ov 2009 10:04:59 -0500 CC: ptp-dev@... To: yangtzj@... Hi Jie, The changes to the views look fine. To fix the slurm.h problem, I've modified the proxy code to add "PTP_" to the beginning of all proxy*.h constants. Please update the slurm C code to use the new names and hopefully this should resolve the problem. Regards, Greg 使用Messenger保护盾2.0,支持多账号登录! 现在就下载! _______________________________________________ ptp-dev mailing list ptp-dev@... https://dev.eclipse.org/mailman/listinfo/ptp-dev _______________________________________________ ptp-dev mailing list ptp-dev@... https://dev.eclipse.org/mailman/listinfo/ptp-dev |
|
|
Re: Re: Question about bug 292049Dave,
The debugger uses the working dir also, so that looks correct. I'd suggest checking it's passed and if not just use the current dir. Greg On Nov 5, 2009, at 11:25 AM, Dave Wootton wrote: > Greg > I have code in the proxy already, ifdefed out for now, that is > supposed to > generate the routing file. The code generates the routing file after > the > attach.cfg file is read. The code I have now writes one line per > task with > task index, hostname and the string '7777' (I don't remember what > 7777 is > for). This is only a few lines of code so I should be able to make the > change fairly quickly. > > The questions I have are what directory do I need to create this in, > and > how is that directory name passed to the proxy? Currently I think my > code > is picking it up from the PTP_JOB_WORKING_DIR_ATTR passed in the > target > program invocation request but I'm not sure if that's the right > value or > if I can count on that always being passed. > Dave > > > > From: > Greg Watson <g.watson@...> > To: > JiangJie <yangtzj@...> > Cc: > ptp-dev@... > Date: > 11/05/2009 09:51 AM > Subject: > [ptp-dev] Re: Question about bug 292049 > Sent by: > ptp-dev-bounces@... > > > > Jie, > > Yes, this should really be inside the 'if', but it was moved because > the > PE RM does not currently generate a routing file. > > Dave, would it be possible to add this to the PE RM? Would it help > if I > provided some support functions in the utils package? > > Greg > > On Nov 5, 2009, at 9:37 AM, JiangJie wrote: > > Hi Greg, > > I'm almost done with the new patch. > But during the test process, I found a problem that has been solved > before. > In SDMDebugger.java, writeRoutingFile() method has been moved > outside the > following "if (fSdmRunner !== null)" condition, > which will eliminate the use of > SLURMServiceProvider.needsDebuggerLaunchHelp(). Even if > needsDebuggerLaunchHelp() returns false, > the PTP debugger will still try to write the routing file. As we have > discussed, SLURM proxy cann't provide enough information > for PTP debugger to generate routing file.Instead, it writes the > routing > file on its own. > > So is it possbile to move the call to writeRoutingFile() inside the > "if" > condition? (There is a version of PTP where the call to > writeRoutingFile() > IS inside the "if" condition in my cvs update. When did this change > happen?) > > Regards, > Jie > > Subject: Re: Question about bug 292049 > From: g.watson@... > Date: Tue, 3 N! ov 2009 10:04:59 -0500 > CC: ptp-dev@... > To: yangtzj@... > > Hi Jie, > > The changes to the views look fine. > > To fix the slurm.h problem, I've modified the proxy code to add > "PTP_" to > the beginning of all proxy*.h constants. Please update the slurm C > code to > use the new names and hopefully this should resolve the problem. > > Regards, > Greg > > > 使用Messenger保护盾2.0,支持多账号登录! 现在就下载! > _______________________________________________ > ptp-dev mailing list > ptp-dev@... > https://dev.eclipse.org/mailman/listinfo/ptp-dev > > > > _______________________________________________ > ptp-dev mailing list > ptp-dev@... > https://dev.eclipse.org/mailman/listinfo/ptp-dev _______________________________________________ ptp-dev mailing list ptp-dev@... https://dev.eclipse.org/mailman/listinfo/ptp-dev |
|
|
Re: Question about bug 292049I've fixed this now.
Regards, Greg On Nov 5, 2009, at 9:37 AM, JiangJie wrote:
_______________________________________________ ptp-dev mailing list ptp-dev@... https://dev.eclipse.org/mailman/listinfo/ptp-dev |
|
|
Re: Re: Question about bug 292049Ok, I will try to get this done in the next few days. Two questions:
1)What should I be using as the third token in eack line? I suspect '7777' was some scaffolding code I had and that I need a real value to put there 2) How should we coordinate thye update of SDMDebugger.java? Dave From: Greg Watson <g.watson@...> To: Parallel Tools Platform general developers <ptp-dev@...> Date: 11/05/2009 12:51 PM Subject: Re: [ptp-dev] Re: Question about bug 292049 Sent by: ptp-dev-bounces@... Dave, The debugger uses the working dir also, so that looks correct. I'd suggest checking it's passed and if not just use the current dir. Greg On Nov 5, 2009, at 11:25 AM, Dave Wootton wrote: > Greg > I have code in the proxy already, ifdefed out for now, that is > supposed to > generate the routing file. The code generates the routing file after > the > attach.cfg file is read. The code I have now writes one line per > task with > task index, hostname and the string '7777' (I don't remember what > 7777 is > for). This is only a few lines of code so I should be able to make the > change fairly quickly. > > The questions I have are what directory do I need to create this in, > and > how is that directory name passed to the proxy? Currently I think my > code > is picking it up from the PTP_JOB_WORKING_DIR_ATTR passed in the > target > program invocation request but I'm not sure if that's the right > value or > if I can count on that always being passed. > Dave > > > > From: > Greg Watson <g.watson@...> > To: > JiangJie <yangtzj@...> > Cc: > ptp-dev@... > Date: > 11/05/2009 09:51 AM > Subject: > [ptp-dev] Re: Question about bug 292049 > Sent by: > ptp-dev-bounces@... > > > > Jie, > > Yes, this should really be inside the 'if', but it was moved because > the > PE RM does not currently generate a routing file. > > Dave, would it be possible to add this to the PE RM? Would it help > if I > provided some support functions in the utils package? > > Greg > > On Nov 5, 2009, at 9:37 AM, JiangJie wrote: > > Hi Greg, > > I'm almost done with the new patch. > But during the test process, I found a problem that has been solved > before. > In SDMDebugger.java, writeRoutingFile() method has been moved > outside the > following "if (fSdmRunner !== null)" condition, > which will eliminate the use of > SLURMServiceProvider.needsDebuggerLaunchHelp(). Even if > needsDebuggerLaunchHelp() returns false, > the PTP debugger will still try to write the routing file. As we have > discussed, SLURM proxy cann't provide enough information > for PTP debugger to generate routing file.Instead, it writes the > routing > file on its own. > > So is it possbile to move the call to writeRoutingFile() inside the > "if" > condition? (There is a version of PTP where the call to > writeRoutingFile() > IS inside the "if" condition in my cvs update. When did this change > happen?) > > Regards, > Jie > > Subject: Re: Question about bug 292049 > From: g.watson@... > Date: Tue, 3 N! ov 2009 10:04:59 -0500 > CC: ptp-dev@... > To: yangtzj@... > > Hi Jie, > > The changes to the views look fine. > > To fix the slurm.h problem, I've modified the proxy code to add > "PTP_" to > the beginning of all proxy*.h constants. Please update the slurm C > code to > use the new names and hopefully this should resolve the problem. > > Regards, > Greg > > > 使用Messenger保护盾2.0,支持多账号登录! 现在就下载! > _______________________________________________ > ptp-dev mailing list > ptp-dev@... > https://dev.eclipse.org/mailman/listinfo/ptp-dev > > > > _______________________________________________ > ptp-dev mailing list > ptp-dev@... > https://dev.eclipse.org/mailman/listinfo/ptp-dev ptp-dev mailing list ptp-dev@... https://dev.eclipse.org/mailman/listinfo/ptp-dev _______________________________________________ ptp-dev mailing list ptp-dev@... https://dev.eclipse.org/mailman/listinfo/ptp-dev |
|
|
Re: Re: Question about bug 292049The third number is a TCP/IP port number that each process listens on
for an incoming connection. The number should be unique for each node (so if two processes are on the same node, their port numbers will be different). It looks like the debugger currently generates a pseudo- random number between 50000 and 60000. It doesn't matter if the port number is being used by another process as the servers have an internal algorithm to deal with that. I've already changed the java code, so as soon as you change the PE RM, the debugger will be working again :-). Greg On Nov 5, 2009, at 1:21 PM, Dave Wootton wrote: > Ok, I will try to get this done in the next few days. Two questions: > 1)What should I be using as the third token in eack line? I suspect > '7777' > was some scaffolding code I had and that I need a real value to put > there > 2) How should we coordinate thye update of SDMDebugger.java? > > Dave > > > > From: > Greg Watson <g.watson@...> > To: > Parallel Tools Platform general developers <ptp-dev@...> > Date: > 11/05/2009 12:51 PM > Subject: > Re: [ptp-dev] Re: Question about bug 292049 > Sent by: > ptp-dev-bounces@... > > > > Dave, > > The debugger uses the working dir also, so that looks correct. I'd > suggest checking it's passed and if not just use the current dir. > > Greg > > On Nov 5, 2009, at 11:25 AM, Dave Wootton wrote: > >> Greg >> I have code in the proxy already, ifdefed out for now, that is >> supposed to >> generate the routing file. The code generates the routing file after >> the >> attach.cfg file is read. The code I have now writes one line per >> task with >> task index, hostname and the string '7777' (I don't remember what >> 7777 is >> for). This is only a few lines of code so I should be able to make >> the >> change fairly quickly. >> >> The questions I have are what directory do I need to create this in, >> and >> how is that directory name passed to the proxy? Currently I think my >> code >> is picking it up from the PTP_JOB_WORKING_DIR_ATTR passed in the >> target >> program invocation request but I'm not sure if that's the right >> value or >> if I can count on that always being passed. >> Dave >> >> >> >> From: >> Greg Watson <g.watson@...> >> To: >> JiangJie <yangtzj@...> >> Cc: >> ptp-dev@... >> Date: >> 11/05/2009 09:51 AM >> Subject: >> [ptp-dev] Re: Question about bug 292049 >> Sent by: >> ptp-dev-bounces@... >> >> >> >> Jie, >> >> Yes, this should really be inside the 'if', but it was moved because >> the >> PE RM does not currently generate a routing file. >> >> Dave, would it be possible to add this to the PE RM? Would it help >> if I >> provided some support functions in the utils package? >> >> Greg >> >> On Nov 5, 2009, at 9:37 AM, JiangJie wrote: >> >> Hi Greg, >> >> I'm almost done with the new patch. >> But during the test process, I found a problem that has been solved >> before. >> In SDMDebugger.java, writeRoutingFile() method has been moved >> outside the >> following "if (fSdmRunner !== null)" condition, >> which will eliminate the use of >> SLURMServiceProvider.needsDebuggerLaunchHelp(). Even if >> needsDebuggerLaunchHelp() returns false, >> the PTP debugger will still try to write the routing file. As we have >> discussed, SLURM proxy cann't provide enough information >> for PTP debugger to generate routing file.Instead, it writes the >> routing >> file on its own. >> >> So is it possbile to move the call to writeRoutingFile() inside the >> "if" >> condition? (There is a version of PTP where the call to >> writeRoutingFile() >> IS inside the "if" condition in my cvs update. When did this change >> happen?) >> >> Regards, >> Jie >> >> Subject: Re: Question about bug 292049 >> From: g.watson@... >> Date: Tue, 3 N! ov 2009 10:04:59 -0500 >> CC: ptp-dev@... >> To: yangtzj@... >> >> Hi Jie, >> >> The changes to the views look fine. >> >> To fix the slurm.h problem, I've modified the proxy code to add >> "PTP_" to >> the beginning of all proxy*.h constants. Please update the slurm C >> code to >> use the new names and hopefully this should resolve the problem. >> >> Regards, >> Greg >> >> >> 使用Messenger保护盾2.0,支持多账号登录! 现在就下载! >> _______________________________________________ >> ptp-dev mailing list >> ptp-dev@... >> https://dev.eclipse.org/mailman/listinfo/ptp-dev >> >> >> >> _______________________________________________ >> ptp-dev mailing list >> ptp-dev@... >> https://dev.eclipse.org/mailman/listinfo/ptp-dev > > _______________________________________________ > ptp-dev mailing list > ptp-dev@... > https://dev.eclipse.org/mailman/listinfo/ptp-dev > > > > _______________________________________________ > ptp-dev mailing list > ptp-dev@... > https://dev.eclipse.org/mailman/listinfo/ptp-dev _______________________________________________ ptp-dev mailing list ptp-dev@... https://dev.eclipse.org/mailman/listinfo/ptp-dev |
|
|
Re: Re: Question about bug 292049Greg
I just committed the routing file changes for the PE proxy so the debugger should work again. I modeled the port number generation logic after what you had in SDMDebugger.java While I was fixing this, I saw the same connect: Invalid argument problem we were looking at last month, this time with just two MPI tasks. I think I know what is going on. This is sort of a timing problem caused by leaving an old routing file hanging around after the debugger exits. In the PE proxy model, the child SDMs start as the PE application. I think that if the routing file doesn't exist, you have logic where they spin until the routing file appears and the master SDM starts. If there's no routing file, then the debugger starts correctly. If there's an old routing file hanging around, then the child SDMs read it and get bad port numbers, resulting in the connect failure. I was reliably able to start the SDM debugger if I deleted the routing file before I started the debugger. I was reliably able to get either a connect: invalid argument failure or a child SDM exiting with rc -1 if I did not delete the old routing file before starting the debugger. I think the solution to this is that once the master SDM has initialized, delete the routing file. Note that this does not fix the case where somebody starts two debug sessions in the same working directory since the second debug instance will likely trip over the old routing file. This case is unlikely, but using unique filenames for each routing file could fix that. Dave From: Greg Watson <g.watson@...> To: Parallel Tools Platform general developers <ptp-dev@...> Date: 11/05/2009 02:07 PM Subject: Re: [ptp-dev] Re: Question about bug 292049 Sent by: ptp-dev-bounces@... The third number is a TCP/IP port number that each process listens on for an incoming connection. The number should be unique for each node (so if two processes are on the same node, their port numbers will be different). It looks like the debugger currently generates a pseudo- random number between 50000 and 60000. It doesn't matter if the port number is being used by another process as the servers have an internal algorithm to deal with that. I've already changed the java code, so as soon as you change the PE RM, the debugger will be working again :-). Greg On Nov 5, 2009, at 1:21 PM, Dave Wootton wrote: > Ok, I will try to get this done in the next few days. Two questions: > 1)What should I be using as the third token in eack line? I suspect > '7777' > was some scaffolding code I had and that I need a real value to put > there > 2) How should we coordinate thye update of SDMDebugger.java? > > Dave > > > > From: > Greg Watson <g.watson@...> > To: > Parallel Tools Platform general developers <ptp-dev@...> > Date: > 11/05/2009 12:51 PM > Subject: > Re: [ptp-dev] Re: Question about bug 292049 > Sent by: > ptp-dev-bounces@... > > > > Dave, > > The debugger uses the working dir also, so that looks correct. I'd > suggest checking it's passed and if not just use the current dir. > > Greg > > On Nov 5, 2009, at 11:25 AM, Dave Wootton wrote: > >> Greg >> I have code in the proxy already, ifdefed out for now, that is >> supposed to >> generate the routing file. The code generates the routing file after >> the >> attach.cfg file is read. The code I have now writes one line per >> task with >> task index, hostname and the string '7777' (I don't remember what >> 7777 is >> for). This is only a few lines of code so I should be able to make >> the >> change fairly quickly. >> >> The questions I have are what directory do I need to create this in, >> and >> how is that directory name passed to the proxy? Currently I think my >> code >> is picking it up from the PTP_JOB_WORKING_DIR_ATTR passed in the >> target >> program invocation request but I'm not sure if that's the right >> value or >> if I can count on that always being passed. >> Dave >> >> >> >> From: >> Greg Watson <g.watson@...> >> To: >> JiangJie <yangtzj@...> >> Cc: >> ptp-dev@... >> Date: >> 11/05/2009 09:51 AM >> Subject: >> [ptp-dev] Re: Question about bug 292049 >> Sent by: >> ptp-dev-bounces@... >> >> >> >> Jie, >> >> Yes, this should really be inside the 'if', but it was moved because >> the >> PE RM does not currently generate a routing file. >> >> Dave, would it be possible to add this to the PE RM? Would it help >> if I >> provided some support functions in the utils package? >> >> Greg >> >> On Nov 5, 2009, at 9:37 AM, JiangJie wrote: >> >> Hi Greg, >> >> I'm almost done with the new patch. >> But during the test process, I found a problem that has been solved >> before. >> In SDMDebugger.java, writeRoutingFile() method has been moved >> outside the >> following "if (fSdmRunner !== null)" condition, >> which will eliminate the use of >> SLURMServiceProvider.needsDebuggerLaunchHelp(). Even if >> needsDebuggerLaunchHelp() returns false, >> the PTP debugger will still try to write the routing file. As we have >> discussed, SLURM proxy cann't provide enough information >> for PTP debugger to generate routing file.Instead, it writes the >> routing >> file on its own. >> >> So is it possbile to move the call to writeRoutingFile() inside the >> "if" >> condition? (There is a version of PTP where the call to >> writeRoutingFile() >> IS inside the "if" condition in my cvs update. When did this change >> happen?) >> >> Regards, >> Jie >> >> Subject: Re: Question about bug 292049 >> From: g.watson@... >> Date: Tue, 3 N! ov 2009 10:04:59 -0500 >> CC: ptp-dev@... >> To: yangtzj@... >> >> Hi Jie, >> >> The changes to the views look fine. >> >> To fix the slurm.h problem, I've modified the proxy code to add >> "PTP_" to >> the beginning of all proxy*.h constants. Please update the slurm C >> code to >> use the new names and hopefully this should resolve the problem. >> >> Regards, >> Greg >> >> >> 使用Messenger保护盾2.0,支持多账号登录! 现在就下载! >> _______________________________________________ >> ptp-dev mailing list >> ptp-dev@... >> https://dev.eclipse.org/mailman/listinfo/ptp-dev >> >> >> >> _______________________________________________ >> ptp-dev mailing list >> ptp-dev@... >> https://dev.eclipse.org/mailman/listinfo/ptp-dev > > _______________________________________________ > ptp-dev mailing list > ptp-dev@... > https://dev.eclipse.org/mailman/listinfo/ptp-dev > > > > _______________________________________________ > ptp-dev mailing list > ptp-dev@... > https://dev.eclipse.org/mailman/listinfo/ptp-dev ptp-dev mailing list ptp-dev@... https://dev.eclipse.org/mailman/listinfo/ptp-dev _______________________________________________ ptp-dev mailing list ptp-dev@... https://dev.eclipse.org/mailman/listinfo/ptp-dev |
|
|
Re: Re: Question about bug 292049Dave,
Ok, great. Would you mind opening a bug with this information in it? That way I can keep track of it. Thanks, Greg On Nov 6, 2009, at 9:46 AM, Dave Wootton wrote: > Greg > I just committed the routing file changes for the PE proxy so the > debugger > should work again. I modeled the port number generation logic after > what > you had in SDMDebugger.java > > While I was fixing this, I saw the same connect: Invalid argument > problem > we were looking at last month, this time with just two MPI tasks. I > think > I know what is going on. This is sort of a timing problem caused by > leaving an old routing file hanging around after the debugger exits. > > In the PE proxy model, the child SDMs start as the PE application. I > think > that if the routing file doesn't exist, you have logic where they spin > until the routing file appears and the master SDM starts. If there's > no > routing file, then the debugger starts correctly. If there's an old > routing file hanging around, then the child SDMs read it and get bad > port > numbers, resulting in the connect failure. > > I was reliably able to start the SDM debugger if I deleted the routing > file before I started the debugger. I was reliably able to get > either a > connect: invalid argument failure or a child SDM exiting with rc -1 > if I > did not delete the old routing file before starting the debugger. > > I think the solution to this is that once the master SDM has > initialized, > delete the routing file. Note that this does not fix the case where > somebody starts two debug sessions in the same working directory > since the > second debug instance will likely trip over the old routing file. This > case is unlikely, but using unique filenames for each routing file > could > fix that. > Dave > > > > From: > Greg Watson <g.watson@...> > To: > Parallel Tools Platform general developers <ptp-dev@...> > Date: > 11/05/2009 02:07 PM > Subject: > Re: [ptp-dev] Re: Question about bug 292049 > Sent by: > ptp-dev-bounces@... > > > > The third number is a TCP/IP port number that each process listens on > for an incoming connection. The number should be unique for each node > (so if two processes are on the same node, their port numbers will be > different). It looks like the debugger currently generates a pseudo- > random number between 50000 and 60000. It doesn't matter if the port > number is being used by another process as the servers have an > internal algorithm to deal with that. > > I've already changed the java code, so as soon as you change the PE > RM, the debugger will be working again :-). > > Greg > > On Nov 5, 2009, at 1:21 PM, Dave Wootton wrote: > >> Ok, I will try to get this done in the next few days. Two questions: >> 1)What should I be using as the third token in eack line? I suspect >> '7777' >> was some scaffolding code I had and that I need a real value to put >> there >> 2) How should we coordinate thye update of SDMDebugger.java? >> >> Dave >> >> >> >> From: >> Greg Watson <g.watson@...> >> To: >> Parallel Tools Platform general developers <ptp-dev@...> >> Date: >> 11/05/2009 12:51 PM >> Subject: >> Re: [ptp-dev] Re: Question about bug 292049 >> Sent by: >> ptp-dev-bounces@... >> >> >> >> Dave, >> >> The debugger uses the working dir also, so that looks correct. I'd >> suggest checking it's passed and if not just use the current dir. >> >> Greg >> >> On Nov 5, 2009, at 11:25 AM, Dave Wootton wrote: >> >>> Greg >>> I have code in the proxy already, ifdefed out for now, that is >>> supposed to >>> generate the routing file. The code generates the routing file after >>> the >>> attach.cfg file is read. The code I have now writes one line per >>> task with >>> task index, hostname and the string '7777' (I don't remember what >>> 7777 is >>> for). This is only a few lines of code so I should be able to make >>> the >>> change fairly quickly. >>> >>> The questions I have are what directory do I need to create this in, >>> and >>> how is that directory name passed to the proxy? Currently I think my >>> code >>> is picking it up from the PTP_JOB_WORKING_DIR_ATTR passed in the >>> target >>> program invocation request but I'm not sure if that's the right >>> value or >>> if I can count on that always being passed. >>> Dave >>> >>> >>> >>> From: >>> Greg Watson <g.watson@...> >>> To: >>> JiangJie <yangtzj@...> >>> Cc: >>> ptp-dev@... >>> Date: >>> 11/05/2009 09:51 AM >>> Subject: >>> [ptp-dev] Re: Question about bug 292049 >>> Sent by: >>> ptp-dev-bounces@... >>> >>> >>> >>> Jie, >>> >>> Yes, this should really be inside the 'if', but it was moved because >>> the >>> PE RM does not currently generate a routing file. >>> >>> Dave, would it be possible to add this to the PE RM? Would it help >>> if I >>> provided some support functions in the utils package? >>> >>> Greg >>> >>> On Nov 5, 2009, at 9:37 AM, JiangJie wrote: >>> >>> Hi Greg, >>> >>> I'm almost done with the new patch. >>> But during the test process, I found a problem that has been solved >>> before. >>> In SDMDebugger.java, writeRoutingFile() method has been moved >>> outside the >>> following "if (fSdmRunner !== null)" condition, >>> which will eliminate the use of >>> SLURMServiceProvider.needsDebuggerLaunchHelp(). Even if >>> needsDebuggerLaunchHelp() returns false, >>> the PTP debugger will still try to write the routing file. As we >>> have >>> discussed, SLURM proxy cann't provide enough information >>> for PTP debugger to generate routing file.Instead, it writes the >>> routing >>> file on its own. >>> >>> So is it possbile to move the call to writeRoutingFile() inside the >>> "if" >>> condition? (There is a version of PTP where the call to >>> writeRoutingFile() >>> IS inside the "if" condition in my cvs update. When did this change >>> happen?) >>> >>> Regards, >>> Jie >>> >>> Subject: Re: Question about bug 292049 >>> From: g.watson@... >>> Date: Tue, 3 N! ov 2009 10:04:59 -0500 >>> CC: ptp-dev@... >>> To: yangtzj@... >>> >>> Hi Jie, >>> >>> The changes to the views look fine. >>> >>> To fix the slurm.h problem, I've modified the proxy code to add >>> "PTP_" to >>> the beginning of all proxy*.h constants. Please update the slurm C >>> code to >>> use the new names and hopefully this should resolve the problem. >>> >>> Regards, >>> Greg >>> >>> >>> 使用Messenger保护盾2.0,支持多账号登录! 现在就下载! >>> _______________________________________________ >>> ptp-dev mailing list >>> ptp-dev@... >>> https://dev.eclipse.org/mailman/listinfo/ptp-dev >>> >>> >>> >>> _______________________________________________ >>> ptp-dev mailing list >>> ptp-dev@... >>> https://dev.eclipse.org/mailman/listinfo/ptp-dev >> >> _______________________________________________ >> ptp-dev mailing list >> ptp-dev@... >> https://dev.eclipse.org/mailman/listinfo/ptp-dev >> >> >> >> _______________________________________________ >> ptp-dev mailing list >> ptp-dev@... >> https://dev.eclipse.org/mailman/listinfo/ptp-dev > > _______________________________________________ > ptp-dev mailing list > ptp-dev@... > https://dev.eclipse.org/mailman/listinfo/ptp-dev > > > > _______________________________________________ > ptp-dev mailing list > ptp-dev@... > https://dev.eclipse.org/mailman/listinfo/ptp-dev _______________________________________________ ptp-dev mailing list ptp-dev@... https://dev.eclipse.org/mailman/listinfo/ptp-dev |
|
|
RE: Re: Question about bug 292049Hi Greg, I also encouter the same problem as Dave described. My solution is to delete the "old" routing file generated for the first debug session. However, this is not an elegant solution. Maybe the unique file name for each debug session is a possible way. For example, we can append the job id of the debug session to the routing file name. Regards, Jie > To: ptp-dev@... > Subject: Re: [ptp-dev] Re: Question about bug 292049 > From: dwootton@... > Date: Fri, 6 Nov 2009 09:46:34 -0500 > > Greg > I just committed the routing file changes for the PE proxy so the debugger > should work again. I modeled the port number generation logic after what > you had in SDMDebugger.java > > While I was fixing this, I saw the same connect: Invalid argument problem > we were looking at last month, this time with just two MPI tasks. I think > I know what is going on. This is sort of a timing problem caused by > leaving an old routing file hanging around after the debugger exits. > > In the PE proxy model, the child SDMs start as the PE application. I think > that if the routing file doesn't exist, you have logic where they spin > until the routing file appears and the master SDM starts. If there's no > routing file, then the debugger starts correctly. If there's an old > routing file hanging around, then the child SDMs read it and get bad port > numbers, resulting in the connect failure. > > I was reliably able to start the SDM debugger if I deleted the routing > file before I started the debugger. I was reliably able to get either a > connect: invalid argument failure or a child SDM exiting with rc -1 if I > did not delete the old routing file before starting the debugger. > > I think the solution to this is that once the master SDM has initialized, > delete the routing file. Note that this does not fix the case where > somebody starts two debug sessions in the same working directory since the > second debug instance will likely trip over the old routing file. This > case is unlikely, but using unique filenames for each routing file could > fix that. > Dave > 使用Messenger保护盾2.0,支持多账号登录! 现在就下载! _______________________________________________ ptp-dev mailing list ptp-dev@... https://dev.eclipse.org/mailman/listinfo/ptp-dev |
|
|
Re: Re: Question about bug 292049Jie, Dave,
I think just deleting the file is fine for now. The routing file is a bit of a hack because it assumes that all processes can access a shared filesystem. I had plans to transfer the routing information via the debugger sockets, but ran out of time to implement. I hope to get that in a later version (feel free to implement if you'd like.) Greg On Nov 6, 2009, at 11:43 PM, JiangJie wrote:
_______________________________________________ ptp-dev mailing list ptp-dev@... https://dev.eclipse.org/mailman/listinfo/ptp-dev |
| Free embeddable forum powered by Nabble | Forum Help |