rundata_once UNIQUE clause

View: New views
10 Messages — Rating Filter:   Alert me  

rundata_once UNIQUE clause

by Ethan Mallove :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I see from pg_dump that perfbase setup has created the following table:

CREATE TABLE rundata_once (
  run_index integer,
  hostname character varying(256),
  os_name character varying(256) DEFAULT 'N/A'::character varying,
  os_version character varying(256) DEFAULT 'N/A'::character varying,
  platform_hardware character varying(256) DEFAULT 'N/A'::character varying,
  platform_type character varying(256) DEFAULT 'N/A'::character varying,
  platform_id character varying(256) DEFAULT 'N/A'::character varying,
  mpi_name character varying(256),
  mpi_version character varying(256),
  mpi_get_section_name character varying(256) DEFAULT 'N/A'::character varying,
  mpi_install_section_name character varying(256) DEFAULT 'N/A'::character varying,
  test_build_section_name character varying(256) DEFAULT 'N/A'::character varying,
  test_run_section_name character varying(256) DEFAULT 'N/A'::character varying,
  merge_stdout_stderr integer DEFAULT 0,
  environment text DEFAULT 'N/A'::text
);

But then there is:

INSERT INTO rundata_once VALUES (1, 'foo', 'bar', ... )
INSERT INTO rundata_once VALUES (2, 'foo', 'bar', ... )
INSERT INTO rundata_once VALUES (3, 'foo', 'bar', ... )
etc...

The same series of "once" values inserted not once, but many times.  Shouldn't
there be a line like:

UNIQUE (hostname, os_name, os_version, platform_hardware, platform_type,
platform_id, mpi_name, mpi_version, mpi_get_section_name,
mpi_install_section_name, test_build_section_name, test_run_section_name,
merge_stdout_stderr, environment);

- in the CREATE TABLE clause to avoid repeating data? So for each test_run,
perfbase would check the rundata_once table and say "I've already seen this
exact series of 'once' values, I will index this run to the matched
rundata_once.run_index that I found".

Or am I missing something?

-Ethan

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@...
For additional commands, e-mail: users-help@...


Re: rundata_once UNIQUE clause

by Joachim Worringen-4 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Ethan Mallove wrote:
[...]

> INSERT INTO rundata_once VALUES (1, 'foo', 'bar', ... )
> INSERT INTO rundata_once VALUES (2, 'foo', 'bar', ... )
> INSERT INTO rundata_once VALUES (3, 'foo', 'bar', ... )
> etc...
>
> The same series of "once" values inserted not once, but many times.  Shouldn't
> there be a line like:
>
> UNIQUE (hostname, os_name, os_version, platform_hardware, platform_type,
> platform_id, mpi_name, mpi_version, mpi_get_section_name,
> mpi_install_section_name, test_build_section_name, test_run_section_name,
> merge_stdout_stderr, environment);
>
> - in the CREATE TABLE clause to avoid repeating data? So for each test_run,
> perfbase would check the rundata_once table and say "I've already seen this
> exact series of 'once' values, I will index this run to the matched
> rundata_once.run_index that I found".
>
> Or am I missing something?

Yes. ;-)

The "once" characteristic of a value means that only a single content
for this value is stored per run. You still can create many runs with
the exact same set of parameters - in fact, this is very important in
many situations to get statistical information on the
reproducability/stability of results.

You do not want to merge results for identical parameter sets into a
single run, because this would kill the 1:1 mapping of "run index" to
"execution of the experiment".

  Joachim

--
Joachim Worringen, Software Architect, Dolphin Interconnect Solutions
phone ++49/(0)228/324 08 17 - http://www.dolphinics.com

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@...
For additional commands, e-mail: users-help@...


Re: rundata_once UNIQUE clause

by Ethan Mallove :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message



Joachim Worringen wrote On 07/26/06 11:28,:
> Ethan Mallove wrote:
> [...]
>
>>INSERT INTO rundata_once VALUES (1, 'foo', 'bar', ... )
>>INSERT INTO rundata_once VALUES (2, 'foo', 'bar', ... )
>>INSERT INTO rundata_once VALUES (3, 'foo', 'bar', ... )
>>etc...
>>

It seems like we would want to be able to point all the following runs to a
single row of "once" data.

INSERT INTO rundata VALUES (1, 'test_x', 'pass' ...
INSERT INTO rundata VALUES (1, 'test_y', 'pass' ...
INSERT INTO rundata VALUES (1, 'test_z', 'pass' ...

Why wouldn't we? Is it that you don't want to delete "once" data when you delete
it's corresponding "multiple" data? Couldn't a check be put in place to prevent
this?



>>The same series of "once" values inserted not once, but many times.  Shouldn't
>>there be a line like:
>>
>>UNIQUE (hostname, os_name, os_version, platform_hardware, platform_type,
>>platform_id, mpi_name, mpi_version, mpi_get_section_name,
>>mpi_install_section_name, test_build_section_name, test_run_section_name,
>>merge_stdout_stderr, environment);
>>
>>- in the CREATE TABLE clause to avoid repeating data? So for each test_run,
>>perfbase would check the rundata_once table and say "I've already seen this
>>exact series of 'once' values, I will index this run to the matched
>>rundata_once.run_index that I found".
>>
>>Or am I missing something?
>
>
> Yes. ;-)
>
> The "once" characteristic of a value means that only a single content
> for this value is stored per run. You still can create many runs with
> the exact same set of parameters - in fact, this is very important in
> many situations to get statistical information on the
> reproducability/stability of results.
>
> You do not want to merge results for identical parameter sets into a
> single run, because this would kill the 1:1 mapping of "run index" to
> "execution of the experiment".
>
>   Joachim
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@...
For additional commands, e-mail: users-help@...


Re: rundata_once UNIQUE clause

by Joachim Worringen-4 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Ethan Mallove wrote:

>
> Joachim Worringen wrote On 07/26/06 11:28,:
>> Ethan Mallove wrote:
>> [...]
>>
>>> INSERT INTO rundata_once VALUES (1, 'foo', 'bar', ... )
>>> INSERT INTO rundata_once VALUES (2, 'foo', 'bar', ... )
>>> INSERT INTO rundata_once VALUES (3, 'foo', 'bar', ... )
>>> etc...
>>>
>
> It seems like we would want to be able to point all the following runs to a
> single row of "once" data.
>
> INSERT INTO rundata VALUES (1, 'test_x', 'pass' ...
> INSERT INTO rundata VALUES (1, 'test_y', 'pass' ...
> INSERT INTO rundata VALUES (1, 'test_z', 'pass' ...
>
> Why wouldn't we? Is it that you don't want to delete "once" data when you delete
> it's corresponding "multiple" data? Couldn't a check be put in place to prevent
> this?

I don't see the requirement here. Each run has it's own row in
rundata_once, and it makes no sense messing around with this.

  Joachim

--
Joachim Worringen, Software Architect, Dolphin Interconnect Solutions
phone ++49/(0)228/324 08 17 - http://www.dolphinics.com

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@...
For additional commands, e-mail: users-help@...


Re: rundata_once UNIQUE clause

by Ethan Mallove :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Joachim Worringen wrote On 07/26/06 13:09,:

> Ethan Mallove wrote:
>
>>Joachim Worringen wrote On 07/26/06 11:28,:
>>
>>>Ethan Mallove wrote:
>>>[...]
>>>
>>>
>>>>INSERT INTO rundata_once VALUES (1, 'foo', 'bar', ... )
>>>>INSERT INTO rundata_once VALUES (2, 'foo', 'bar', ... )
>>>>INSERT INTO rundata_once VALUES (3, 'foo', 'bar', ... )
>>>>etc...
>>>>
>>
>>It seems like we would want to be able to point all the following runs to a
>>single row of "once" data.
>>
>>INSERT INTO rundata VALUES (1, 'test_x', 'pass' ...
>>INSERT INTO rundata VALUES (1, 'test_y', 'pass' ...
>>INSERT INTO rundata VALUES (1, 'test_z', 'pass' ...
>>
>>Why wouldn't we? Is it that you don't want to delete "once" data when you delete
>>it's corresponding "multiple" data? Couldn't a check be put in place to prevent
>>this?
>
>
> I don't see the requirement here. Each run has it's own row in
> rundata_once, and it makes no sense messing around with this.


I think the problem is that we currently consider every mpirun command a
test_run, when we should send them in batch (thus, many mpirun's for each test
run). Because we will have thousands of rundata rows which will have identical
rundata_once rows and it makes no sense to repeat these rows when they can be
referenced via an integer index (might as well put them all in a single table).
But whether we submit in batch or not - it shouldn't matter, seems to me it
should recognize that it is about to create an identical rundata_once row and
just reference the matched row from the rundata row.

-Ethan



>
>   Joachim
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@...
For additional commands, e-mail: users-help@...


Re: rundata_once UNIQUE clause

by Joachim Worringen-4 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Ethan Mallove wrote:

> Joachim Worringen wrote On 07/26/06 13:09,:
>> I don't see the requirement here. Each run has it's own row in
>> rundata_once, and it makes no sense messing around with this.
>
>
> I think the problem is that we currently consider every mpirun command a
> test_run, when we should send them in batch (thus, many mpirun's for each test
> run). Because we will have thousands of rundata rows which will have identical
> rundata_once rows and it makes no sense to repeat these rows when they can be
> referenced via an integer index (might as well put them all in a single table).
> But whether we submit in batch or not - it shouldn't matter, seems to me it
> should recognize that it is about to create an identical rundata_once row and
> just reference the matched row from the rundata row.

While this would be possible to implement, I don't think the database
server will have a problem with a table with some ten million rows (each
maybe a kB of data), which is only a few GB of data... I can not put
this high on my priority list, but if someone submits a patch incl.
functionality to upgrade existing experiments, we could integrate it.

If you don't want the database to grow so fast, maybe it will be a
simpler approach to reduce the numbers of runs to be created by
submitting multiple test outputs as a single run. As far as I understand
your setup, this is something that perfbase supports today (more than
one input file can be used to create a single run).

  Joachim

--
Joachim Worringen, Software Architect, Dolphin Interconnect Solutions
phone ++49/(0)228/324 08 17 - http://www.dolphinics.com

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@...
For additional commands, e-mail: users-help@...


Re: rundata_once UNIQUE clause

by Ethan Mallove :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message



Joachim Worringen wrote On 07/26/06 17:27,:

> Ethan Mallove wrote:
>
>>Joachim Worringen wrote On 07/26/06 13:09,:
>>
>>>I don't see the requirement here. Each run has it's own row in
>>>rundata_once, and it makes no sense messing around with this.
>>
>>
>>I think the problem is that we currently consider every mpirun command a
>>test_run, when we should send them in batch (thus, many mpirun's for each test
>>run). Because we will have thousands of rundata rows which will have identical
>>rundata_once rows and it makes no sense to repeat these rows when they can be
>>referenced via an integer index (might as well put them all in a single table).
>>But whether we submit in batch or not - it shouldn't matter, seems to me it
>>should recognize that it is about to create an identical rundata_once row and
>>just reference the matched row from the rundata row.
>
>
> While this would be possible to implement, I don't think the database
> server will have a problem with a table with some ten million rows (each
> maybe a kB of data), which is only a few GB of data... I can not put
> this high on my priority list, but if someone submits a patch incl.
> functionality to upgrade existing experiments, we could integrate it.
>
> If you don't want the database to grow so fast, maybe it will be a
> simpler approach to reduce the numbers of runs to be created by
> submitting multiple test outputs as a single run. As far as I understand
> your setup, this is something that perfbase supports today (more than
> one input file can be used to create a single run).
>

Could we also use the <set_seperation> element? I think we would opt for this
over multiple files since we input to perfbase over HTTP.

>   Joachim
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@...
For additional commands, e-mail: users-help@...


Re: rundata_once UNIQUE clause

by Joachim Worringen-4 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Ethan Mallove wrote:
> Could we also use the <set_seperation> element? I think we would opt for this
> over multiple files since we input to perfbase over HTTP.

This will create a new run, which you don't want.

  Joachim

--
Joachim Worringen, Software Architect, Dolphin Interconnect Solutions
phone ++49/(0)228/324 08 17 - http://www.dolphinics.com

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@...
For additional commands, e-mail: users-help@...


Re: rundata_once UNIQUE clause

by Ethan Mallove :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Joachim Worringen wrote On 07/27/06 03:12,:
> Ethan Mallove wrote:
>
>>Could we also use the <set_seperation> element? I think we would opt for this
>>over multiple files since we input to perfbase over HTTP.
>
>
> This will create a new run, which you don't want.
>

I don't suppose it would be easy to implement something like:

<!ELEMENT set_separation      (match|regexp)>
<!ATTLIST set_separation runs (single|multiple) "multiple">


>   Joachim
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@...
For additional commands, e-mail: users-help@...


Re: rundata_once UNIQUE clause

by Joachim Worringen-4 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Ethan Mallove wrote:

> Joachim Worringen wrote On 07/27/06 03:12,:
>> Ethan Mallove wrote:
>>
>>> Could we also use the <set_seperation> element? I think we would opt for this
>>> over multiple files since we input to perfbase over HTTP.
>>
>> This will create a new run, which you don't want.
>>
>
> I don't suppose it would be easy to implement something like:
>
> <!ELEMENT set_separation      (match|regexp)>
> <!ATTLIST set_separation runs (single|multiple) "multiple">

To achieve what exactly?

A 'set_separation' (confusion! should better be called 'run_separation')
is used to create multiple runs from a single input file. I guess this
is not what you want? Instead, you want to reduce the number of runs to
be created for your given experiment output data?

What you probably want is a way to create multiple data sets within one
run, each covering i.e. a different test and its result within the same
environment. To achieve this, define the required parameters and
results, and use the "store_set" attribute of a <named_location> (or a
<tabular_location>, which does this implicitely) to have perfbase store
the current dataset, and start gathering data for the next one.

  Joachim

--
Joachim Worringen, Software Architect, Dolphin Interconnect Solutions
phone ++49/(0)228/324 08 17 - http://www.dolphinics.com

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@...
For additional commands, e-mail: users-help@...