|
View:
New views
6 Messages
—
Rating Filter:
Alert me
|
|
|
input conversionsHello,
I'd like to extract data from the output of different programs which use different formats for the same value. To be more specific, I have in my experiment description: <parameter> <name>day</name> <synopsis>date of the measurements</synopsis> <datatype>date</datatype> </parameter> <parameter> <name>time</name> <synopsis>time since UTC midnight</synopsis> <datatype>timeofday</datatype> </parameter> and in one input (a syslog) something like: Mar 13 12:26:48 host_foo prog: bar 0.000036500, 0.000001000, -10941, 0 and in another (ntp loopstats): 54172 30465.066 -0.001089505 15.497513 0.000009425 1.729744 4 ^^^^^ Julian Day ^^^^^^^^^ seconds since UTC midnight For the syslog I use: <tabular_location rows="1"> <regex>(\w+ \d+) (\d{2}:\d{2}:\d{2})(.*)prog: bar(.*)</regex> <tabular_value> <name>time</name> <pos>2</pos> </tabular_value> and for ntp: <tabular_location columns="7"> <tabular_value> <name>time</name> <pos>1</pos> </tabular_value> The date is extracted from the file name because I didn't even dare to attempt automatic reconciliation of Julian day value with "Mar 13". The <regex> is more complicated than necessary - I was trying out whether I could reference individual groups later via the <pos> element, but that's not how it worked. Instead the <regex> just selects a line just like <match> and then it is split at white spaces. But even for the time of day I have doubts whether the XML fragments above will work: currently they parse the input, but I have not tried actually storing something in my database. My hope is that <datatype>timeofday</datatype> will somehow turn on smart input filtering so that both number of seconds and min:second are handled correctly. Is that hope justified? If not, what effect does setting <datatype> have? Similar question for <datatype>date</datatype>: I use the ISO yyyy-mm-dd notation. Is that going to be parsed okay? Finally, in another parameter with <datatype>duration</datatype> I want to store values which yet another program prints as e.g. "1.02us", "2.3ms", "2s". I have seen that the "scaling" attribute is used in input specifications for map input values into the unit used by the experiment; is something like this possible if the base unit in the input varies? To be honest, the whole concept of "unit", "base_unit" is a bit unclear to me at the moment. The DTD contains no documentation about this other than specifying what the legal values are. Any help is appreciated. In the meantime I'll take the low road and massage my input data into a digestible format via preprocessing... ;-) -- Best Regards, Patrick Ohly The content of this message is my personal opinion only and although I am an employee of Intel, the statements I make here in no way represent Intel's position on the issue, nor am I authorized to speak on behalf of Intel on this matter. --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscribe@... For additional commands, e-mail: users-help@... |
|
|
Re: input conversionsPatrick Ohly wrote:
> But even for the time of day I have doubts whether the XML fragments > above will work: currently they parse the input, but I have not tried > actually storing something in my database. You can always do a dryrun with the '-t' option of the input command (which you probably did). If you create a new run to test how PostgreSQL treats your input, you can safely delete the newly created run. To make this convenient, you can use a specific synopsis, like: 1. input your data pb input --synopsis=test123 ... 2. find the matching run index ID=`pb find --synosis=test123` 3. check what's in the run: pb info -r $ID --data=all 4. delete the run if not ok: pb delete -e ... -r $ID > My hope is that <datatype>timeofday</datatype> will somehow turn on > smart input filtering so that both number of seconds and min:second are > handled correctly. Is that hope justified? If not, what effect does > setting <datatype> have? > > Similar question for <datatype>date</datatype>: I use the ISO yyyy-mm-dd > notation. Is that going to be parsed okay? All <datatype> specifiations are used to chose a PostgreSQL datatype (see map pb_valid_dtypes in pb_common.py). The parsing is actually done by PostgreSQL then, which is pretty flexible. For details refer to the PostgreSQL documentation. If parsing of PostgeSQL fails, you will get an error message. > Finally, in another parameter with <datatype>duration</datatype> I want > to store values which yet another program prints as e.g. "1.02us", > "2.3ms", "2s". I have seen that the "scaling" attribute is used in input > specifications for map input values into the unit used by the > experiment; is something like this possible if the base unit in the > input varies? No, this is not supported, but would certainly be possible. > To be honest, the whole concept of "unit", "base_unit" is a bit unclear > to me at the moment. The DTD contains no documentation about this other > than specifying what the legal values are. That's currently all that matters conc. the unit. It is set up this way to possibly allow to calculate with units in operators. This is partly done with the scaling: if you use the scale operator to scale by 1000, 'k' becomes 'M' etc. > Any help is appreciated. In the meantime I'll take the low road and > massage my input data into a digestible format via preprocessing... ;-) That's a perfectly "legal" way to do it. You can use "-" with 'perfbase input' to read from stdin. Joachim -- Joachim Worringen, Software Architect, Dolphin Interconnect Solutions phone ++49/(0)228/324 08 17 - http://www.dolphinics.com --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscribe@... For additional commands, e-mail: users-help@... |
|
|
Re: input conversionsOn Tue, 2007-03-13 at 14:56 -0700, Joachim Worringen wrote:
> Patrick Ohly wrote: > > But even for the time of day I have doubts whether the XML fragments > > above will work: currently they parse the input, but I have not tried > > actually storing something in my database. > > You can always do a dryrun with the '-t' option of the input command > (which you probably did). Indeed ;-) Now I know that is not really a full test of the import because as you said, PostgreSQL does a substantial part of the work of interpreting the input data. > > My hope is that <datatype>timeofday</datatype> will somehow turn on > > smart input filtering so that both number of seconds and min:second are > > handled correctly. Is that hope justified? If not, what effect does > > setting <datatype> have? > > > > Similar question for <datatype>date</datatype>: I use the ISO yyyy-mm-dd > > notation. Is that going to be parsed okay? > > All <datatype> specifiations are used to chose a PostgreSQL datatype > (see map pb_valid_dtypes in pb_common.py). The parsing is actually done > by PostgreSQL then, which is pretty flexible. For details refer to the > PostgreSQL documentation. If parsing of PostgeSQL fails, you will get an > error message. That worked out pretty well, I only had to convert the "seconds since UTC midnight" myself because that wasn't grokked by PostgreSQL. However, the problem now is that with my x-axis using the "timeofday" type gnuplot is given data in the "hh:mm:ss.subsecond" format, which it does not understand without further options (everything is rounded to the full hour). Adding these: <option>xdata time</option> <option>timefmt "%H:%M:%S"</option> gets me further, but now gnuplot complains that it needs a full "using" specification for time values: gnuplot> plot '-' title 'time_{offset} for host = knscsl012, peer = iknas-net77' axes x1y1 with linespoints lw 2 ps 2 ^ line 0: Need full using spec for x time data Assuming that I figure out what it wants, can I add this "using" to the XML query or do I have to patch perfbase to generate this? -- Best Regards Patrick Ohly Senior Software Engineer Intel GmbH Software & Solutions Group Hermuelheimer Strasse 8a Phone: +49-2232-2090-30 50321 Bruehl Fax: +49-2232-2090-29 Germany Intel GmbH, Dornacher Strasse 1, 85622 Feldkirchen/Muenchen Germany Sitz der Gesellschaft: Feldkirchen bei Muenchen Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer Registergericht: Muenchen HRB 47456 Ust.- IdNr./VAT Registration No.: DE129385895 Citibank Frankfurt (BLZ 502 109 00) 600119052 --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscribe@... For additional commands, e-mail: users-help@... |
|
|
Re: input conversionsPatrick Ohly wrote:
> However, the problem now is that with my x-axis using the "timeofday" > type gnuplot is given data in the "hh:mm:ss.subsecond" format, which it > does not understand without further options (everything is rounded to > the full hour). Adding these: > <option>xdata time</option> > <option>timefmt "%H:%M:%S"</option> > gets me further, but now gnuplot complains that it needs a full "using" > specification for time values: > gnuplot> plot '-' title 'time_{offset} for host = knscsl012, peer = iknas-net77' axes x1y1 with linespoints lw 2 ps 2 > ^ > line 0: Need full using spec for x time data > > Assuming that I figure out what it wants, can I add this "using" to the > XML query or do I have to patch perfbase to generate this? No, the plot command itself can not be controled from within the XML query. I never plotted real time values myself; so please try to see what gnuplot wants to have here. Maybe we can add it to all plot command lines. Joachim -- Joachim Worringen, Software Architect, Dolphin Interconnect Solutions phone ++49/(0)228/324 08 17 - http://www.dolphinics.com --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscribe@... For additional commands, e-mail: users-help@... |
|
|
Re: input conversionsOn Tue, 2007-03-13 at 17:55 -0700, Joachim Worringen wrote:
> Patrick Ohly wrote: > > However, the problem now is that with my x-axis using the "timeofday" > > type gnuplot is given data in the "hh:mm:ss.subsecond" format, which it > > does not understand without further options (everything is rounded to > > the full hour). Adding these: > > <option>xdata time</option> > > <option>timefmt "%H:%M:%S"</option> > > gets me further, but now gnuplot complains that it needs a full "using" > > specification for time values: > > gnuplot> plot '-' title 'time_{offset} for host = knscsl012, peer = iknas-net77' axes x1y1 with linespoints lw 2 ps 2 > > ^ > > line 0: Need full using spec for x time data > > > > Assuming that I figure out what it wants, can I add this "using" to the > > XML query or do I have to patch perfbase to generate this? > > No, the plot command itself can not be controled from within the XML > query. I never plotted real time values myself; so please try to see > what gnuplot wants to have here. simpler like "1:" or ":" failed. > Maybe we can add it to all plot command > lines. Attached is the patch which currently works for me; I have not tried yet whether the counting of columns in that patch really works. Does it look right? The patch is also incomplete: it should add the required additional options automatically if the x-axis uses a PostgreSQL time value. I'm not sure where to add/check that. -- Best Regards Patrick Ohly Senior Software Engineer Intel GmbH Software & Solutions Group Hermuelheimer Strasse 8a Phone: +49-2232-2090-30 50321 Bruehl Fax: +49-2232-2090-29 Germany Intel GmbH, Dornacher Strasse 1, 85622 Feldkirchen/Muenchen Germany Sitz der Gesellschaft: Feldkirchen bei Muenchen Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer Registergericht: Muenchen HRB 47456 Ust.- IdNr./VAT Registration No.: DE129385895 Citibank Frankfurt (BLZ 502 109 00) 600119052 [using.patch] *** pb_plotutil.py.orig 2007-03-13 18:00:10.090442000 +0100 --- pb_plotutil.py 2007-03-13 18:16:36.418427000 +0100 *************** *** 354,361 **** plot_cmdline = "%s " % self.plot_cmd for dset in range(len(self.data_sets)): data_title = self._clean_str(self.data_titles[dset]) ! plot_cmdline += "'-' title '%s' %s with %s," % \ ! (mk_enhanced_gp(data_title), self._build_axis_str(dset), self.plot_styles[dset]) plot_cmdline = plot_cmdline[:-1] + "\n" --- 354,363 ---- plot_cmdline = "%s " % self.plot_cmd for dset in range(len(self.data_sets)): data_title = self._clean_str(self.data_titles[dset]) ! num_columns = len(self.data_sets[dset]) ! plot_cmdline += "'-' using 1%s:%d title '%s' %s with %s," % \ ! (":" * (num_columns - 2), num_columns, ! mk_enhanced_gp(data_title), self._build_axis_str(dset), self.plot_styles[dset]) plot_cmdline = plot_cmdline[:-1] + "\n" *************** *** 437,444 **** plot_cmdline = "%s " % self.plot_cmd for dset in range(len(self.data_sets)): data_title = self._clean_str(self.data_titles[dset]) ! plot_cmdline += "'-' title '%s' %s with %s %s," % \ ! (mk_enhanced_gp(data_title), self._build_axis_str(dset), self.plot_styles[dset], elmt_fmt) plot_cmdline = rstrip(plot_cmdline, ',') + "\n" --- 439,448 ---- plot_cmdline = "%s " % self.plot_cmd for dset in range(len(self.data_sets)): data_title = self._clean_str(self.data_titles[dset]) ! num_columns = len(self.data_sets[dset]) ! plot_cmdline += "'-' using 1%s:%d title '%s' %s with %s %s," % \ ! (":" * (num_columns - 2), num_columns, ! mk_enhanced_gp(data_title), self._build_axis_str(dset), self.plot_styles[dset], elmt_fmt) plot_cmdline = rstrip(plot_cmdline, ',') + "\n" --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscribe@... For additional commands, e-mail: users-help@... |
|
|
Re: input conversionsPatrick Ohly schrieb:
> Nothing fancy actually: a simple "using 1:2" was enough. Anything > simpler like "1:" or ":" failed. > >> Maybe we can add it to all plot command >> lines. > > Attached is the patch which currently works for me; I have not tried yet > whether the counting of columns in that patch really works. Does it look > right? Thanks - doesn't look plain wrong ;-). I'll have to test it (or you can do this, too: just enter the test directory and call "make" to run the test suite). I'm not sure if it works with 3D plots. > The patch is also incomplete: it should add the required additional > options automatically if the x-axis uses a PostgreSQL time value. I'm > not sure where to add/check that. That's a different story. IIRC, this has to be done in pb_output.py as the datatype is no longer available in pb_plotutil.py. I added this to the issue tracker. Joachim -- Joachim Worringen, Software Architect, Dolphin Interconnect Solutions phone ++49/(0)228/324 08 17 - http://www.dolphinics.com --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscribe@... For additional commands, e-mail: users-help@... |
| Free embeddable forum powered by Nabble | Forum Help |