I'm trying to add a new cluster to our network, and I've hit a snag
with job submission. We have things to the point where the existing
networks can submit jobs to the new cluster, but the new cluster can
not submit any jobs. (We are running Torque and Maui.)
When I try to submit a job from the new cluster (a machine called
morph4), I see:
$ echo "sleep 10" | qsub -q morph
qsub: Bad UID for job execution MSG=ruserok failed validating
testuser/testuser from morph4
(morph is the new cluster.) testuser is set up so that it has the same
UID and GID on all of the machines in the network. If I give the same
command from a machine on the old cluster (submitting to morph), it runs.
The error in the torque server_log file is:
10/27/2010 14:34:45;0080;PBS_Server;Req;req_reject;Reject reply
code=15023(Bad UID for job execution MSG=ruserok failed validating
testuser/testuser from morph4), aux=0, type=QueueJob, from testuser@morph4
I've checked all of the "allow_node_submit" and "allow_proxy_user"
variables that I've ever read about, and they all seem to be set
correctly. So I probably missed one, but I don't know which one.
Also, the error message says it's "validating testuser/testuser".
Is that UID/GID? Because the UID should be testuser, but the GID is
set to something else.
Has anyone seen this error? Thanks in advance.