input conversions

View: New views
6 Messages — Rating Filter:   Alert me  

input conversions

by Patrick Ohly-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hello,

I'd like to extract data from the output of different programs which use
different formats for the same value. To be more specific, I have in my
experiment description:

  <parameter>
    <name>day</name>
    <synopsis>date of the measurements</synopsis>
    <datatype>date</datatype>
  </parameter>

  <parameter>
    <name>time</name>
    <synopsis>time since UTC midnight</synopsis>
    <datatype>timeofday</datatype>
  </parameter>

and in one input (a syslog) something like:

Mar 13 12:26:48 host_foo prog: bar 0.000036500, 0.000001000, -10941, 0

and in another (ntp loopstats):
54172 30465.066 -0.001089505 15.497513 0.000009425 1.729744 4
^^^^^
Julian Day
      ^^^^^^^^^
      seconds since UTC midnight

For the syslog I use:

  <tabular_location rows="1">
    <regex>(\w+ \d+) (\d{2}:\d{2}:\d{2})(.*)prog: bar(.*)</regex>
    <tabular_value>
      <name>time</name>
      <pos>2</pos>
    </tabular_value>

and for ntp:
  <tabular_location columns="7">
    <tabular_value>
      <name>time</name>
      <pos>1</pos>
    </tabular_value>

The date is extracted from the file name because I didn't even dare to
attempt automatic reconciliation of Julian day value with "Mar 13". The
<regex> is more complicated than necessary - I was trying out whether I
could reference individual groups later via the <pos> element, but
that's not how it worked. Instead the <regex> just selects a line just
like <match> and then it is split at white spaces.

But even for the time of day I have doubts whether the XML fragments
above will work: currently they parse the input, but I have not tried
actually storing something in my database.

My hope is that <datatype>timeofday</datatype> will somehow turn on
smart input filtering so that both number of seconds and min:second are
handled correctly. Is that hope justified? If not, what effect does
setting <datatype> have?

Similar question for <datatype>date</datatype>: I use the ISO yyyy-mm-dd
notation. Is that going to be parsed okay?

Finally, in another parameter with <datatype>duration</datatype> I want
to store values which yet another program prints as e.g. "1.02us",
"2.3ms", "2s". I have seen that the "scaling" attribute is used in input
specifications for map input values into the unit used by the
experiment; is something like this possible if the base unit in the
input varies?

To be honest, the whole concept of "unit", "base_unit" is a bit unclear
to me at the moment. The DTD contains no documentation about this other
than specifying what the legal values are.

Any help is appreciated. In the meantime I'll take the low road and
massage my input data into a digestible format via preprocessing... ;-)

--
Best Regards, Patrick Ohly

The content of this message is my personal opinion only and although
I am an employee of Intel, the statements I make here in no way
represent Intel's position on the issue, nor am I authorized to speak
on behalf of Intel on this matter.

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@...
For additional commands, e-mail: users-help@...


Re: input conversions

by Joachim Worringen-4 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Patrick Ohly wrote:
> But even for the time of day I have doubts whether the XML fragments
> above will work: currently they parse the input, but I have not tried
> actually storing something in my database.

You can always do a dryrun with the '-t' option of the input command
(which you probably did).

If you create a new run to test how PostgreSQL treats your input, you
can safely delete the newly created run. To make this convenient, you
can use a specific synopsis, like:

1. input your data
   pb input --synopsis=test123 ...
2. find the matching run index
   ID=`pb find --synosis=test123`
3. check what's in the run:
   pb info -r $ID --data=all
4. delete the run if not ok:
   pb delete -e ... -r $ID

> My hope is that <datatype>timeofday</datatype> will somehow turn on
> smart input filtering so that both number of seconds and min:second are
> handled correctly. Is that hope justified? If not, what effect does
> setting <datatype> have?
>
> Similar question for <datatype>date</datatype>: I use the ISO yyyy-mm-dd
> notation. Is that going to be parsed okay?

All <datatype> specifiations are used to chose a PostgreSQL datatype
(see map pb_valid_dtypes in pb_common.py). The parsing is actually done
by PostgreSQL then, which is pretty flexible. For details refer to the
PostgreSQL documentation. If parsing of PostgeSQL fails, you will get an
error message.

> Finally, in another parameter with <datatype>duration</datatype> I want
> to store values which yet another program prints as e.g. "1.02us",
> "2.3ms", "2s". I have seen that the "scaling" attribute is used in input
> specifications for map input values into the unit used by the
> experiment; is something like this possible if the base unit in the
> input varies?

No, this is not supported, but would certainly be possible.

> To be honest, the whole concept of "unit", "base_unit" is a bit unclear
> to me at the moment. The DTD contains no documentation about this other
> than specifying what the legal values are.

That's currently all that matters conc. the unit. It is set up this way
to possibly allow to calculate with units in operators. This is partly
done with the scaling: if you use the scale operator to scale by 1000,
'k' becomes 'M' etc.

> Any help is appreciated. In the meantime I'll take the low road and
> massage my input data into a digestible format via preprocessing... ;-)

That's a perfectly "legal" way to do it. You can use "-" with 'perfbase
input' to read from stdin.

  Joachim

--
Joachim Worringen, Software Architect, Dolphin Interconnect Solutions
phone ++49/(0)228/324 08 17 - http://www.dolphinics.com

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@...
For additional commands, e-mail: users-help@...


Re: input conversions

by Patrick Ohly-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Tue, 2007-03-13 at 14:56 -0700, Joachim Worringen wrote:
> Patrick Ohly wrote:
> > But even for the time of day I have doubts whether the XML fragments
> > above will work: currently they parse the input, but I have not tried
> > actually storing something in my database.
>
> You can always do a dryrun with the '-t' option of the input command
> (which you probably did).

Indeed ;-) Now I know that is not really a full test of the import
because as you said, PostgreSQL does a substantial part of the work of
interpreting the input data.

> > My hope is that <datatype>timeofday</datatype> will somehow turn on
> > smart input filtering so that both number of seconds and min:second are
> > handled correctly. Is that hope justified? If not, what effect does
> > setting <datatype> have?
> >
> > Similar question for <datatype>date</datatype>: I use the ISO yyyy-mm-dd
> > notation. Is that going to be parsed okay?
>
> All <datatype> specifiations are used to chose a PostgreSQL datatype
> (see map pb_valid_dtypes in pb_common.py). The parsing is actually done
> by PostgreSQL then, which is pretty flexible. For details refer to the
> PostgreSQL documentation. If parsing of PostgeSQL fails, you will get an
> error message.

That worked out pretty well, I only had to convert the "seconds since
UTC midnight" myself because that wasn't grokked by PostgreSQL.

However, the problem now is that with my x-axis using the "timeofday"
type gnuplot is given data in the "hh:mm:ss.subsecond" format, which it
does not understand without further options (everything is rounded to
the full hour). Adding these:
    <option>xdata time</option>
    <option>timefmt "%H:%M:%S"</option>
gets me further, but now gnuplot complains that it needs a full "using"
specification for time values:
gnuplot> plot '-'  title 'time_{offset} for host = knscsl012, peer = iknas-net77' axes x1y1 with linespoints lw 2 ps 2
                                                                                                                      ^
         line 0: Need full using spec for x time data

Assuming that I figure out what it wants, can I add this "using" to the
XML query or do I have to patch perfbase to generate this?

--
Best Regards

Patrick Ohly
Senior Software Engineer

Intel GmbH
Software & Solutions Group                
Hermuelheimer Strasse 8a                  Phone: +49-2232-2090-30
50321 Bruehl                              Fax: +49-2232-2090-29
Germany

Intel GmbH, Dornacher Strasse 1, 85622 Feldkirchen/Muenchen Germany
Sitz der Gesellschaft: Feldkirchen bei Muenchen
Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
Registergericht: Muenchen HRB 47456
Ust.- IdNr./VAT Registration No.: DE129385895
Citibank Frankfurt (BLZ 502 109 00) 600119052

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@...
For additional commands, e-mail: users-help@...


Re: input conversions

by Joachim Worringen-4 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Patrick Ohly wrote:

> However, the problem now is that with my x-axis using the "timeofday"
> type gnuplot is given data in the "hh:mm:ss.subsecond" format, which it
> does not understand without further options (everything is rounded to
> the full hour). Adding these:
>     <option>xdata time</option>
>     <option>timefmt "%H:%M:%S"</option>
> gets me further, but now gnuplot complains that it needs a full "using"
> specification for time values:
> gnuplot> plot '-'  title 'time_{offset} for host = knscsl012, peer = iknas-net77' axes x1y1 with linespoints lw 2 ps 2
>                                                                                                                       ^
>          line 0: Need full using spec for x time data
>
> Assuming that I figure out what it wants, can I add this "using" to the
> XML query or do I have to patch perfbase to generate this?

No, the plot command itself can not be controled from within the XML
query. I never plotted real time values myself; so please try to see
what gnuplot wants to have here. Maybe we can add it to all plot command
lines.

  Joachim

--
Joachim Worringen, Software Architect, Dolphin Interconnect Solutions
phone ++49/(0)228/324 08 17 - http://www.dolphinics.com

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@...
For additional commands, e-mail: users-help@...


Re: input conversions

by Patrick Ohly-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Tue, 2007-03-13 at 17:55 -0700, Joachim Worringen wrote:

> Patrick Ohly wrote:
> > However, the problem now is that with my x-axis using the "timeofday"
> > type gnuplot is given data in the "hh:mm:ss.subsecond" format, which it
> > does not understand without further options (everything is rounded to
> > the full hour). Adding these:
> >     <option>xdata time</option>
> >     <option>timefmt "%H:%M:%S"</option>
> > gets me further, but now gnuplot complains that it needs a full "using"
> > specification for time values:
> > gnuplot> plot '-'  title 'time_{offset} for host = knscsl012, peer = iknas-net77' axes x1y1 with linespoints lw 2 ps 2
> >                                                                                                                       ^
> >          line 0: Need full using spec for x time data
> >
> > Assuming that I figure out what it wants, can I add this "using" to the
> > XML query or do I have to patch perfbase to generate this?
>
> No, the plot command itself can not be controled from within the XML
> query. I never plotted real time values myself; so please try to see
> what gnuplot wants to have here.
Nothing fancy actually: a simple "using 1:2" was enough. Anything
simpler like "1:" or ":" failed.

>  Maybe we can add it to all plot command
> lines.

Attached is the patch which currently works for me; I have not tried yet
whether the counting of columns in that patch really works. Does it look
right?

The patch is also incomplete: it should add the required additional
options automatically if the x-axis uses a PostgreSQL time value. I'm
not sure where to add/check that.

--
Best Regards

Patrick Ohly
Senior Software Engineer

Intel GmbH
Software & Solutions Group                
Hermuelheimer Strasse 8a                  Phone: +49-2232-2090-30
50321 Bruehl                              Fax: +49-2232-2090-29
Germany

Intel GmbH, Dornacher Strasse 1, 85622 Feldkirchen/Muenchen Germany
Sitz der Gesellschaft: Feldkirchen bei Muenchen
Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
Registergericht: Muenchen HRB 47456
Ust.- IdNr./VAT Registration No.: DE129385895
Citibank Frankfurt (BLZ 502 109 00) 600119052

[using.patch]

*** pb_plotutil.py.orig 2007-03-13 18:00:10.090442000 +0100
--- pb_plotutil.py 2007-03-13 18:16:36.418427000 +0100
***************
*** 354,361 ****
          plot_cmdline = "%s " % self.plot_cmd
          for dset in range(len(self.data_sets)):
              data_title = self._clean_str(self.data_titles[dset])
!             plot_cmdline += "'-' title '%s' %s with %s," % \
!                         (mk_enhanced_gp(data_title),
                           self._build_axis_str(dset),
                           self.plot_styles[dset])
          plot_cmdline = plot_cmdline[:-1] + "\n"
--- 354,363 ----
          plot_cmdline = "%s " % self.plot_cmd
          for dset in range(len(self.data_sets)):
              data_title = self._clean_str(self.data_titles[dset])
!             num_columns = len(self.data_sets[dset])
!             plot_cmdline += "'-' using 1%s:%d title '%s' %s with %s," % \
!                         (":" * (num_columns - 2), num_columns,
!                          mk_enhanced_gp(data_title),
                           self._build_axis_str(dset),
                           self.plot_styles[dset])
          plot_cmdline = plot_cmdline[:-1] + "\n"
***************
*** 437,444 ****
          plot_cmdline = "%s " % self.plot_cmd
          for dset in range(len(self.data_sets)):
              data_title = self._clean_str(self.data_titles[dset])
!             plot_cmdline += "'-'  title '%s' %s with %s %s," % \
!                         (mk_enhanced_gp(data_title),
                           self._build_axis_str(dset),
                           self.plot_styles[dset], elmt_fmt)
          plot_cmdline = rstrip(plot_cmdline, ',') + "\n"
--- 439,448 ----
          plot_cmdline = "%s " % self.plot_cmd
          for dset in range(len(self.data_sets)):
              data_title = self._clean_str(self.data_titles[dset])
!             num_columns = len(self.data_sets[dset])
!             plot_cmdline += "'-' using 1%s:%d title '%s' %s with %s %s," % \
!                         (":" * (num_columns - 2), num_columns,
!                          mk_enhanced_gp(data_title),
                           self._build_axis_str(dset),
                           self.plot_styles[dset], elmt_fmt)
          plot_cmdline = rstrip(plot_cmdline, ',') + "\n"



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@...
For additional commands, e-mail: users-help@...

Re: input conversions

by Joachim Worringen-4 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Patrick Ohly schrieb:
> Nothing fancy actually: a simple "using 1:2" was enough. Anything
> simpler like "1:" or ":" failed.
>
>>  Maybe we can add it to all plot command
>> lines.
>
> Attached is the patch which currently works for me; I have not tried yet
> whether the counting of columns in that patch really works. Does it look
> right?

Thanks - doesn't look plain wrong ;-). I'll have to test it (or you can
do this, too: just enter the test directory and call "make" to run the
test suite). I'm not sure if it works with 3D plots.

> The patch is also incomplete: it should add the required additional
> options automatically if the x-axis uses a PostgreSQL time value. I'm
> not sure where to add/check that.

That's a different story. IIRC, this has to be done in pb_output.py as
the datatype is no longer available in pb_plotutil.py. I added this to
the issue tracker.

  Joachim

--
Joachim Worringen, Software Architect, Dolphin Interconnect Solutions
phone ++49/(0)228/324 08 17 - http://www.dolphinics.com

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@...
For additional commands, e-mail: users-help@...