pstdout_launch: unknown internal error encountered with 319 hosts and --consolidate-output --quiet-readings

View: New views
7 Messages — Rating Filter:   Alert me  

pstdout_launch: unknown internal error encountered with 319 hosts and --consolidate-output --quiet-readings

by Chris Harwell :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Greetings freeipmi users,

I've really enjoyed using freeipmi - it is a great tool. I
particularly like how the host range syntax works and simplifies
certain tasks.

I've recently run into a case where freeipmi fails and hope you can
offer some help or advice.

This fails:
 ipmi-sensors -g Fan -h xxxx[0001-319]-lom  --consolidate-output
--quiet-readings
also where the second number is 319 fails.

These invocations work:
  ipmi-sensors -g Fan -h xxxx[0001-319]-lom --consolidate-output
  ipmi-sensors -g Fan -h xxxx[0001-318]-lom --consolidate-output
--quiet-readings
  ipmi-sensors -g Fan -h xxxx[0001-319]-lom
  ipmi-sensors -g Fan -h xxxx[0001-318]-lom --consolidate-output
  ipmi-sensors -g Fan -h xxxx[0001-319]-lom --consolidate-output
  ipmi-sensors -g Fan -h xxxx[0001-319]-lom --quiet-readings

when it fails the output looks like this:
$  ipmi-sensors -g Fan -h xxxx[0001-319]-lom --consolidate-output
--quiet-readings
pstdout_launch: unknown internal error

I encounter this in the several versions I could check quickly 0.6.5,
0.7.12 and 0.7.13:
:bin$ ipmi-sensors -V
ipmi-sensors - 0.7.13
Copyright (C) 2003-2008 FreeIPMI Core Team
This program is free software; you may redistribute it under the terms of
the GNU General Public License.  This program has absolutely no warranty.
drdenws02:bin$  ipmi-sensors -g Fan -h drdb[0001-319]-lom -u ADMIN -p
ADMIN --consolidate-output --quiet-readings
pstdout_launch: unknown internal error

debug output is copious, the last bit looks like this:
xxxx0317-lom: IPMI Command Data:
xxxx0317-lom: ------------------
xxxx0317-lom: [              3Ch] = cmd[ 8b]
xxxx0317-lom: [               0h] = comp_code[ 8b]
xxxx0317-lom: IPMI Trailer:
xxxx0317-lom: --------------
xxxx0317-lom: [              23h] = checksum2[ 8b]
pstdout_launch: unknown internal error

Please advise  - am I running into a known limitation or just using
this wrong? Is there other information I ought to provide?

Thanks in advance,
Chris Harwell


_______________________________________________
Freeipmi-users mailing list
Freeipmi-users@...
http://lists.gnu.org/mailman/listinfo/freeipmi-users

Re: pstdout_launch: unknown internal error encountered with 319 hosts and --consolidate-output --quiet-readings

by Al Chu11 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hey Chris,

On Wed, 2009-10-14 at 11:53 -0400, Chris Harwell wrote:
> Greetings freeipmi users,
>
> I've really enjoyed using freeipmi - it is a great tool. I
> particularly like how the host range syntax works and simplifies
> certain tasks.

Thanks.

> I've recently run into a case where freeipmi fails and hope you can
> offer some help or advice.
>
> This fails:
>  ipmi-sensors -g Fan -h xxxx[0001-319]-lom  --consolidate-output
> --quiet-readings
> also where the second number is 319 fails.
>
> These invocations work:
>   ipmi-sensors -g Fan -h xxxx[0001-319]-lom --consolidate-output
>   ipmi-sensors -g Fan -h xxxx[0001-318]-lom --consolidate-output
> --quiet-readings
>   ipmi-sensors -g Fan -h xxxx[0001-319]-lom
>   ipmi-sensors -g Fan -h xxxx[0001-318]-lom --consolidate-output
>   ipmi-sensors -g Fan -h xxxx[0001-319]-lom --consolidate-output
>   ipmi-sensors -g Fan -h xxxx[0001-319]-lom --quiet-readings
>
> when it fails the output looks like this:
> $  ipmi-sensors -g Fan -h xxxx[0001-319]-lom --consolidate-output
> --quiet-readings
> pstdout_launch: unknown internal error
>
> I encounter this in the several versions I could check quickly 0.6.5,
> 0.7.12 and 0.7.13:
> :bin$ ipmi-sensors -V
> ipmi-sensors - 0.7.13
> Copyright (C) 2003-2008 FreeIPMI Core Team
> This program is free software; you may redistribute it under the terms of
> the GNU General Public License.  This program has absolutely no warranty.
> drdenws02:bin$  ipmi-sensors -g Fan -h drdb[0001-319]-lom -u ADMIN -p
> ADMIN --consolidate-output --quiet-readings
> pstdout_launch: unknown internal error
>
> debug output is copious, the last bit looks like this:
> xxxx0317-lom: IPMI Command Data:
> xxxx0317-lom: ------------------
> xxxx0317-lom: [              3Ch] = cmd[ 8b]
> xxxx0317-lom: [               0h] = comp_code[ 8b]
> xxxx0317-lom: IPMI Trailer:
> xxxx0317-lom: --------------
> xxxx0317-lom: [              23h] = checksum2[ 8b]
> pstdout_launch: unknown internal error
>
> Please advise  - am I running into a known limitation or just using
> this wrong? Is there other information I ought to provide?

In all liklihood there is some corner case in the hostrange parsing.
I'll take a look into it and get back to you if I need any more info.

Thanks,
Al

> Thanks in advance,
> Chris Harwell
>
>
> _______________________________________________
> Freeipmi-users mailing list
> Freeipmi-users@...
> http://*lists.gnu.org/mailman/listinfo/freeipmi-users
>
--
Albert Chu
chu11@...
Computer Scientist
High Performance Systems Division
Lawrence Livermore National Laboratory



_______________________________________________
Freeipmi-users mailing list
Freeipmi-users@...
http://lists.gnu.org/mailman/listinfo/freeipmi-users

Re: pstdout_launch: unknown internal error encountered with 319 hosts and --consolidate-output --quiet-readings

by Al Chu11 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hey Chris,

I've reproduced this problem in the underlying hostlist library.  I'm
working with the maintainer of the library to figure out if there is a
bug or if there is a hostrange assumption issue.  I noticed your range
input was:

0001-319

which internally in hostlist will lead to

0001-0319

Is your intent for xxxx[0001-319] to lead to xxxx0318, xxx0319, etc.?

Inputting the later also seems to cause an error, so there probably is a
bug somewhere, may it be an input checking bug or an output bug.

Al

On Wed, 2009-10-14 at 09:43 -0700, Al Chu wrote:

> Hey Chris,
>
> On Wed, 2009-10-14 at 11:53 -0400, Chris Harwell wrote:
> > Greetings freeipmi users,
> >
> > I've really enjoyed using freeipmi - it is a great tool. I
> > particularly like how the host range syntax works and simplifies
> > certain tasks.
>
> Thanks.
>
> > I've recently run into a case where freeipmi fails and hope you can
> > offer some help or advice.
> >
> > This fails:
> >  ipmi-sensors -g Fan -h xxxx[0001-319]-lom  --consolidate-output
> > --quiet-readings
> > also where the second number is 319 fails.
> >
> > These invocations work:
> >   ipmi-sensors -g Fan -h xxxx[0001-319]-lom --consolidate-output
> >   ipmi-sensors -g Fan -h xxxx[0001-318]-lom --consolidate-output
> > --quiet-readings
> >   ipmi-sensors -g Fan -h xxxx[0001-319]-lom
> >   ipmi-sensors -g Fan -h xxxx[0001-318]-lom --consolidate-output
> >   ipmi-sensors -g Fan -h xxxx[0001-319]-lom --consolidate-output
> >   ipmi-sensors -g Fan -h xxxx[0001-319]-lom --quiet-readings
> >
> > when it fails the output looks like this:
> > $  ipmi-sensors -g Fan -h xxxx[0001-319]-lom --consolidate-output
> > --quiet-readings
> > pstdout_launch: unknown internal error
> >
> > I encounter this in the several versions I could check quickly 0.6.5,
> > 0.7.12 and 0.7.13:
> > :bin$ ipmi-sensors -V
> > ipmi-sensors - 0.7.13
> > Copyright (C) 2003-2008 FreeIPMI Core Team
> > This program is free software; you may redistribute it under the terms of
> > the GNU General Public License.  This program has absolutely no warranty.
> > drdenws02:bin$  ipmi-sensors -g Fan -h drdb[0001-319]-lom -u ADMIN -p
> > ADMIN --consolidate-output --quiet-readings
> > pstdout_launch: unknown internal error
> >
> > debug output is copious, the last bit looks like this:
> > xxxx0317-lom: IPMI Command Data:
> > xxxx0317-lom: ------------------
> > xxxx0317-lom: [              3Ch] = cmd[ 8b]
> > xxxx0317-lom: [               0h] = comp_code[ 8b]
> > xxxx0317-lom: IPMI Trailer:
> > xxxx0317-lom: --------------
> > xxxx0317-lom: [              23h] = checksum2[ 8b]
> > pstdout_launch: unknown internal error
> >
> > Please advise  - am I running into a known limitation or just using
> > this wrong? Is there other information I ought to provide?
>
> In all liklihood there is some corner case in the hostrange parsing.
> I'll take a look into it and get back to you if I need any more info.
>
> Thanks,
> Al
>
> > Thanks in advance,
> > Chris Harwell
> >
> >
> > _______________________________________________
> > Freeipmi-users mailing list
> > Freeipmi-users@...
> > http://**lists.gnu.org/mailman/listinfo/freeipmi-users
> >
--
Albert Chu
chu11@...
Computer Scientist
High Performance Systems Division
Lawrence Livermore National Laboratory



_______________________________________________
Freeipmi-users mailing list
Freeipmi-users@...
http://lists.gnu.org/mailman/listinfo/freeipmi-users

Re: pstdout_launch: unknown internal error encountered with 319 hosts and --consolidate-output --quiet-readings

by Al Chu11 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hey Chris,

Just spoke to the maintainer of the internal "hostlist" library.  Short
term, I can build you a beta that can get around the problem.  However,
you will not get a nice

xxxx[0001-319]-lom

output, you would instead get

xxxx0001-lom,xxxx0002-lom,xxxx0003-lom,...

The issue was there was a buffer overflow.  My buffer was 4096 chars,
which sure enough is overflowed after about 316 nodes in your format (13
chars * 316 > 4096).

Now why was there a buffer overflow?  The hostrange library currently
can't deal with hostrange "building" (which is what is done when outputs
are being consolidated) when the host has a suffix (i.e "-lom").
However, I spoke to the author and if there is only 1 "numeric range",
such as in your case, perhaps that can be handled as a special case,
since there is no ambiguity of how to build up the hostrange.  The
suffix situation is a unique situation, most commonly seen with a format
like:

node1-eth2

in the above, it is impossible to know if the '1' or the '2' is the
hostrange part (although normal users can easily guess that it's the '1'
and not the '2', code wise you really never know).

On your end, a short term way to deal with this problem and have a clean
output is to perhaps come up with a different host alias?  Here at LLNL,
we prefix all IPMI addresses with a unique prefix.

Hope that helps short term and hopefully we can get a fix longer term.

Al

On Wed, 2009-10-14 at 10:41 -0700, Al Chu wrote:

> Hey Chris,
>
> I've reproduced this problem in the underlying hostlist library.  I'm
> working with the maintainer of the library to figure out if there is a
> bug or if there is a hostrange assumption issue.  I noticed your range
> input was:
>
> 0001-319
>
> which internally in hostlist will lead to
>
> 0001-0319
>
> Is your intent for xxxx[0001-319] to lead to xxxx0318, xxx0319, etc.?
>
> Inputting the later also seems to cause an error, so there probably is a
> bug somewhere, may it be an input checking bug or an output bug.
>
> Al
>
> On Wed, 2009-10-14 at 09:43 -0700, Al Chu wrote:
> > Hey Chris,
> >
> > On Wed, 2009-10-14 at 11:53 -0400, Chris Harwell wrote:
> > > Greetings freeipmi users,
> > >
> > > I've really enjoyed using freeipmi - it is a great tool. I
> > > particularly like how the host range syntax works and simplifies
> > > certain tasks.
> >
> > Thanks.
> >
> > > I've recently run into a case where freeipmi fails and hope you can
> > > offer some help or advice.
> > >
> > > This fails:
> > >  ipmi-sensors -g Fan -h xxxx[0001-319]-lom  --consolidate-output
> > > --quiet-readings
> > > also where the second number is 319 fails.
> > >
> > > These invocations work:
> > >   ipmi-sensors -g Fan -h xxxx[0001-319]-lom --consolidate-output
> > >   ipmi-sensors -g Fan -h xxxx[0001-318]-lom --consolidate-output
> > > --quiet-readings
> > >   ipmi-sensors -g Fan -h xxxx[0001-319]-lom
> > >   ipmi-sensors -g Fan -h xxxx[0001-318]-lom --consolidate-output
> > >   ipmi-sensors -g Fan -h xxxx[0001-319]-lom --consolidate-output
> > >   ipmi-sensors -g Fan -h xxxx[0001-319]-lom --quiet-readings
> > >
> > > when it fails the output looks like this:
> > > $  ipmi-sensors -g Fan -h xxxx[0001-319]-lom --consolidate-output
> > > --quiet-readings
> > > pstdout_launch: unknown internal error
> > >
> > > I encounter this in the several versions I could check quickly 0.6.5,
> > > 0.7.12 and 0.7.13:
> > > :bin$ ipmi-sensors -V
> > > ipmi-sensors - 0.7.13
> > > Copyright (C) 2003-2008 FreeIPMI Core Team
> > > This program is free software; you may redistribute it under the terms of
> > > the GNU General Public License.  This program has absolutely no warranty.
> > > drdenws02:bin$  ipmi-sensors -g Fan -h drdb[0001-319]-lom -u ADMIN -p
> > > ADMIN --consolidate-output --quiet-readings
> > > pstdout_launch: unknown internal error
> > >
> > > debug output is copious, the last bit looks like this:
> > > xxxx0317-lom: IPMI Command Data:
> > > xxxx0317-lom: ------------------
> > > xxxx0317-lom: [              3Ch] = cmd[ 8b]
> > > xxxx0317-lom: [               0h] = comp_code[ 8b]
> > > xxxx0317-lom: IPMI Trailer:
> > > xxxx0317-lom: --------------
> > > xxxx0317-lom: [              23h] = checksum2[ 8b]
> > > pstdout_launch: unknown internal error
> > >
> > > Please advise  - am I running into a known limitation or just using
> > > this wrong? Is there other information I ought to provide?
> >
> > In all liklihood there is some corner case in the hostrange parsing.
> > I'll take a look into it and get back to you if I need any more info.
> >
> > Thanks,
> > Al
> >
> > > Thanks in advance,
> > > Chris Harwell
> > >
> > >
> > > _______________________________________________
> > > Freeipmi-users mailing list
> > > Freeipmi-users@...
> > > http://***lists.gnu.org/mailman/listinfo/freeipmi-users
> > >
--
Albert Chu
chu11@...
Computer Scientist
High Performance Systems Division
Lawrence Livermore National Laboratory



_______________________________________________
Freeipmi-users mailing list
Freeipmi-users@...
http://lists.gnu.org/mailman/listinfo/freeipmi-users

Re: pstdout_launch: unknown internal error encountered with 319 hosts and --consolidate-output --quiet-readings

by Al Chu11 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hey Chris,

I went ahead and put a beta up.

freeipmi-0.7.14.beta0.tar.gz

at

http://ftp.gluster.com/pub/freeipmi/qa-release/

want to give it a shot?

Al

On Wed, 2009-10-14 at 16:00 -0700, Al Chu wrote:

> Hey Chris,
>
> Just spoke to the maintainer of the internal "hostlist" library.  Short
> term, I can build you a beta that can get around the problem.  However,
> you will not get a nice
>
> xxxx[0001-319]-lom
>
> output, you would instead get
>
> xxxx0001-lom,xxxx0002-lom,xxxx0003-lom,...
>
> The issue was there was a buffer overflow.  My buffer was 4096 chars,
> which sure enough is overflowed after about 316 nodes in your format (13
> chars * 316 > 4096).
>
> Now why was there a buffer overflow?  The hostrange library currently
> can't deal with hostrange "building" (which is what is done when outputs
> are being consolidated) when the host has a suffix (i.e "-lom").
> However, I spoke to the author and if there is only 1 "numeric range",
> such as in your case, perhaps that can be handled as a special case,
> since there is no ambiguity of how to build up the hostrange.  The
> suffix situation is a unique situation, most commonly seen with a format
> like:
>
> node1-eth2
>
> in the above, it is impossible to know if the '1' or the '2' is the
> hostrange part (although normal users can easily guess that it's the '1'
> and not the '2', code wise you really never know).
>
> On your end, a short term way to deal with this problem and have a clean
> output is to perhaps come up with a different host alias?  Here at LLNL,
> we prefix all IPMI addresses with a unique prefix.
>
> Hope that helps short term and hopefully we can get a fix longer term.
>
> Al
>
> On Wed, 2009-10-14 at 10:41 -0700, Al Chu wrote:
> > Hey Chris,
> >
> > I've reproduced this problem in the underlying hostlist library.  I'm
> > working with the maintainer of the library to figure out if there is a
> > bug or if there is a hostrange assumption issue.  I noticed your range
> > input was:
> >
> > 0001-319
> >
> > which internally in hostlist will lead to
> >
> > 0001-0319
> >
> > Is your intent for xxxx[0001-319] to lead to xxxx0318, xxx0319, etc.?
> >
> > Inputting the later also seems to cause an error, so there probably is a
> > bug somewhere, may it be an input checking bug or an output bug.
> >
> > Al
> >
> > On Wed, 2009-10-14 at 09:43 -0700, Al Chu wrote:
> > > Hey Chris,
> > >
> > > On Wed, 2009-10-14 at 11:53 -0400, Chris Harwell wrote:
> > > > Greetings freeipmi users,
> > > >
> > > > I've really enjoyed using freeipmi - it is a great tool. I
> > > > particularly like how the host range syntax works and simplifies
> > > > certain tasks.
> > >
> > > Thanks.
> > >
> > > > I've recently run into a case where freeipmi fails and hope you can
> > > > offer some help or advice.
> > > >
> > > > This fails:
> > > >  ipmi-sensors -g Fan -h xxxx[0001-319]-lom  --consolidate-output
> > > > --quiet-readings
> > > > also where the second number is 319 fails.
> > > >
> > > > These invocations work:
> > > >   ipmi-sensors -g Fan -h xxxx[0001-319]-lom --consolidate-output
> > > >   ipmi-sensors -g Fan -h xxxx[0001-318]-lom --consolidate-output
> > > > --quiet-readings
> > > >   ipmi-sensors -g Fan -h xxxx[0001-319]-lom
> > > >   ipmi-sensors -g Fan -h xxxx[0001-318]-lom --consolidate-output
> > > >   ipmi-sensors -g Fan -h xxxx[0001-319]-lom --consolidate-output
> > > >   ipmi-sensors -g Fan -h xxxx[0001-319]-lom --quiet-readings
> > > >
> > > > when it fails the output looks like this:
> > > > $  ipmi-sensors -g Fan -h xxxx[0001-319]-lom --consolidate-output
> > > > --quiet-readings
> > > > pstdout_launch: unknown internal error
> > > >
> > > > I encounter this in the several versions I could check quickly 0.6.5,
> > > > 0.7.12 and 0.7.13:
> > > > :bin$ ipmi-sensors -V
> > > > ipmi-sensors - 0.7.13
> > > > Copyright (C) 2003-2008 FreeIPMI Core Team
> > > > This program is free software; you may redistribute it under the terms of
> > > > the GNU General Public License.  This program has absolutely no warranty.
> > > > drdenws02:bin$  ipmi-sensors -g Fan -h drdb[0001-319]-lom -u ADMIN -p
> > > > ADMIN --consolidate-output --quiet-readings
> > > > pstdout_launch: unknown internal error
> > > >
> > > > debug output is copious, the last bit looks like this:
> > > > xxxx0317-lom: IPMI Command Data:
> > > > xxxx0317-lom: ------------------
> > > > xxxx0317-lom: [              3Ch] = cmd[ 8b]
> > > > xxxx0317-lom: [               0h] = comp_code[ 8b]
> > > > xxxx0317-lom: IPMI Trailer:
> > > > xxxx0317-lom: --------------
> > > > xxxx0317-lom: [              23h] = checksum2[ 8b]
> > > > pstdout_launch: unknown internal error
> > > >
> > > > Please advise  - am I running into a known limitation or just using
> > > > this wrong? Is there other information I ought to provide?
> > >
> > > In all liklihood there is some corner case in the hostrange parsing.
> > > I'll take a look into it and get back to you if I need any more info.
> > >
> > > Thanks,
> > > Al
> > >
> > > > Thanks in advance,
> > > > Chris Harwell
> > > >
> > > >
> > > > _______________________________________________
> > > > Freeipmi-users mailing list
> > > > Freeipmi-users@...
> > > > http://****lists.gnu.org/mailman/listinfo/freeipmi-users
> > > >
--
Albert Chu
chu11@...
Computer Scientist
High Performance Systems Division
Lawrence Livermore National Laboratory



_______________________________________________
Freeipmi-users mailing list
Freeipmi-users@...
http://lists.gnu.org/mailman/listinfo/freeipmi-users

Re: pstdout_launch: unknown internal error encountered with 319 hosts and --consolidate-output --quiet-readings

by Chris Harwell :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Thanks, the provided 0.7.14.beta0 worked for my testcase, ipmi-sensors
  -h xxxx[0001-0576]-lom -g Fan  --consolidate-output --quiet-readings

For the record, in case it should help others, another longer
workaround is to use IP addresss ranges which provides nice
consolidation too,

ipmi-sensors   -h
xxx.xxx.16.[10-255],xxx.xxx.17.[0-255],xxx.xxx.18.[0-73] -g Fan
--consolidate-output --quiet-readings

That isn't nearly as nice as Al's beta0, but it might help in some
other situation :>


_______________________________________________
Freeipmi-users mailing list
Freeipmi-users@...
http://lists.gnu.org/mailman/listinfo/freeipmi-users

Re: pstdout_launch: unknown internal error encountered with 319 hosts and --consolidate-output --quiet-readings

by Al Chu11 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hey Chris,

Great.  I'll release a new FreeIPMI, which will likely be the last 0.7.X
release before 0.8.1.

Al

On Thu, 2009-10-15 at 09:42 -0400, Chris Harwell wrote:

> Thanks, the provided 0.7.14.beta0 worked for my testcase, ipmi-sensors
>   -h xxxx[0001-0576]-lom -g Fan  --consolidate-output --quiet-readings
>
> For the record, in case it should help others, another longer
> workaround is to use IP addresss ranges which provides nice
> consolidation too,
>
> ipmi-sensors   -h
> xxx.xxx.16.[10-255],xxx.xxx.17.[0-255],xxx.xxx.18.[0-73] -g Fan
> --consolidate-output --quiet-readings
>
> That isn't nearly as nice as Al's beta0, but it might help in some
> other situation :>
>
>
> _______________________________________________
> Freeipmi-users mailing list
> Freeipmi-users@...
> http://*lists.gnu.org/mailman/listinfo/freeipmi-users
>
--
Albert Chu
chu11@...
Computer Scientist
High Performance Systems Division
Lawrence Livermore National Laboratory



_______________________________________________
Freeipmi-users mailing list
Freeipmi-users@...
http://lists.gnu.org/mailman/listinfo/freeipmi-users