|
View:
New views
10 Messages
—
Rating Filter:
Alert me
|
|
|
rundata_once UNIQUE clauseI see from pg_dump that perfbase setup has created the following table:
CREATE TABLE rundata_once ( run_index integer, hostname character varying(256), os_name character varying(256) DEFAULT 'N/A'::character varying, os_version character varying(256) DEFAULT 'N/A'::character varying, platform_hardware character varying(256) DEFAULT 'N/A'::character varying, platform_type character varying(256) DEFAULT 'N/A'::character varying, platform_id character varying(256) DEFAULT 'N/A'::character varying, mpi_name character varying(256), mpi_version character varying(256), mpi_get_section_name character varying(256) DEFAULT 'N/A'::character varying, mpi_install_section_name character varying(256) DEFAULT 'N/A'::character varying, test_build_section_name character varying(256) DEFAULT 'N/A'::character varying, test_run_section_name character varying(256) DEFAULT 'N/A'::character varying, merge_stdout_stderr integer DEFAULT 0, environment text DEFAULT 'N/A'::text ); But then there is: INSERT INTO rundata_once VALUES (1, 'foo', 'bar', ... ) INSERT INTO rundata_once VALUES (2, 'foo', 'bar', ... ) INSERT INTO rundata_once VALUES (3, 'foo', 'bar', ... ) etc... The same series of "once" values inserted not once, but many times. Shouldn't there be a line like: UNIQUE (hostname, os_name, os_version, platform_hardware, platform_type, platform_id, mpi_name, mpi_version, mpi_get_section_name, mpi_install_section_name, test_build_section_name, test_run_section_name, merge_stdout_stderr, environment); - in the CREATE TABLE clause to avoid repeating data? So for each test_run, perfbase would check the rundata_once table and say "I've already seen this exact series of 'once' values, I will index this run to the matched rundata_once.run_index that I found". Or am I missing something? -Ethan --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscribe@... For additional commands, e-mail: users-help@... |
|
|
Re: rundata_once UNIQUE clauseEthan Mallove wrote:
[...] > INSERT INTO rundata_once VALUES (1, 'foo', 'bar', ... ) > INSERT INTO rundata_once VALUES (2, 'foo', 'bar', ... ) > INSERT INTO rundata_once VALUES (3, 'foo', 'bar', ... ) > etc... > > The same series of "once" values inserted not once, but many times. Shouldn't > there be a line like: > > UNIQUE (hostname, os_name, os_version, platform_hardware, platform_type, > platform_id, mpi_name, mpi_version, mpi_get_section_name, > mpi_install_section_name, test_build_section_name, test_run_section_name, > merge_stdout_stderr, environment); > > - in the CREATE TABLE clause to avoid repeating data? So for each test_run, > perfbase would check the rundata_once table and say "I've already seen this > exact series of 'once' values, I will index this run to the matched > rundata_once.run_index that I found". > > Or am I missing something? Yes. ;-) The "once" characteristic of a value means that only a single content for this value is stored per run. You still can create many runs with the exact same set of parameters - in fact, this is very important in many situations to get statistical information on the reproducability/stability of results. You do not want to merge results for identical parameter sets into a single run, because this would kill the 1:1 mapping of "run index" to "execution of the experiment". Joachim -- Joachim Worringen, Software Architect, Dolphin Interconnect Solutions phone ++49/(0)228/324 08 17 - http://www.dolphinics.com --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscribe@... For additional commands, e-mail: users-help@... |
|
|
Re: rundata_once UNIQUE clauseJoachim Worringen wrote On 07/26/06 11:28,: > Ethan Mallove wrote: > [...] > >>INSERT INTO rundata_once VALUES (1, 'foo', 'bar', ... ) >>INSERT INTO rundata_once VALUES (2, 'foo', 'bar', ... ) >>INSERT INTO rundata_once VALUES (3, 'foo', 'bar', ... ) >>etc... >> It seems like we would want to be able to point all the following runs to a single row of "once" data. INSERT INTO rundata VALUES (1, 'test_x', 'pass' ... INSERT INTO rundata VALUES (1, 'test_y', 'pass' ... INSERT INTO rundata VALUES (1, 'test_z', 'pass' ... Why wouldn't we? Is it that you don't want to delete "once" data when you delete it's corresponding "multiple" data? Couldn't a check be put in place to prevent this? >>The same series of "once" values inserted not once, but many times. Shouldn't >>there be a line like: >> >>UNIQUE (hostname, os_name, os_version, platform_hardware, platform_type, >>platform_id, mpi_name, mpi_version, mpi_get_section_name, >>mpi_install_section_name, test_build_section_name, test_run_section_name, >>merge_stdout_stderr, environment); >> >>- in the CREATE TABLE clause to avoid repeating data? So for each test_run, >>perfbase would check the rundata_once table and say "I've already seen this >>exact series of 'once' values, I will index this run to the matched >>rundata_once.run_index that I found". >> >>Or am I missing something? > > > Yes. ;-) > > The "once" characteristic of a value means that only a single content > for this value is stored per run. You still can create many runs with > the exact same set of parameters - in fact, this is very important in > many situations to get statistical information on the > reproducability/stability of results. > > You do not want to merge results for identical parameter sets into a > single run, because this would kill the 1:1 mapping of "run index" to > "execution of the experiment". > > Joachim > --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscribe@... For additional commands, e-mail: users-help@... |
|
|
Re: rundata_once UNIQUE clauseEthan Mallove wrote:
> > Joachim Worringen wrote On 07/26/06 11:28,: >> Ethan Mallove wrote: >> [...] >> >>> INSERT INTO rundata_once VALUES (1, 'foo', 'bar', ... ) >>> INSERT INTO rundata_once VALUES (2, 'foo', 'bar', ... ) >>> INSERT INTO rundata_once VALUES (3, 'foo', 'bar', ... ) >>> etc... >>> > > It seems like we would want to be able to point all the following runs to a > single row of "once" data. > > INSERT INTO rundata VALUES (1, 'test_x', 'pass' ... > INSERT INTO rundata VALUES (1, 'test_y', 'pass' ... > INSERT INTO rundata VALUES (1, 'test_z', 'pass' ... > > Why wouldn't we? Is it that you don't want to delete "once" data when you delete > it's corresponding "multiple" data? Couldn't a check be put in place to prevent > this? I don't see the requirement here. Each run has it's own row in rundata_once, and it makes no sense messing around with this. Joachim -- Joachim Worringen, Software Architect, Dolphin Interconnect Solutions phone ++49/(0)228/324 08 17 - http://www.dolphinics.com --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscribe@... For additional commands, e-mail: users-help@... |
|
|
Re: rundata_once UNIQUE clauseJoachim Worringen wrote On 07/26/06 13:09,:
> Ethan Mallove wrote: > >>Joachim Worringen wrote On 07/26/06 11:28,: >> >>>Ethan Mallove wrote: >>>[...] >>> >>> >>>>INSERT INTO rundata_once VALUES (1, 'foo', 'bar', ... ) >>>>INSERT INTO rundata_once VALUES (2, 'foo', 'bar', ... ) >>>>INSERT INTO rundata_once VALUES (3, 'foo', 'bar', ... ) >>>>etc... >>>> >> >>It seems like we would want to be able to point all the following runs to a >>single row of "once" data. >> >>INSERT INTO rundata VALUES (1, 'test_x', 'pass' ... >>INSERT INTO rundata VALUES (1, 'test_y', 'pass' ... >>INSERT INTO rundata VALUES (1, 'test_z', 'pass' ... >> >>Why wouldn't we? Is it that you don't want to delete "once" data when you delete >>it's corresponding "multiple" data? Couldn't a check be put in place to prevent >>this? > > > I don't see the requirement here. Each run has it's own row in > rundata_once, and it makes no sense messing around with this. I think the problem is that we currently consider every mpirun command a test_run, when we should send them in batch (thus, many mpirun's for each test run). Because we will have thousands of rundata rows which will have identical rundata_once rows and it makes no sense to repeat these rows when they can be referenced via an integer index (might as well put them all in a single table). But whether we submit in batch or not - it shouldn't matter, seems to me it should recognize that it is about to create an identical rundata_once row and just reference the matched row from the rundata row. -Ethan > > Joachim > --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscribe@... For additional commands, e-mail: users-help@... |
|
|
Re: rundata_once UNIQUE clauseEthan Mallove wrote:
> Joachim Worringen wrote On 07/26/06 13:09,: >> I don't see the requirement here. Each run has it's own row in >> rundata_once, and it makes no sense messing around with this. > > > I think the problem is that we currently consider every mpirun command a > test_run, when we should send them in batch (thus, many mpirun's for each test > run). Because we will have thousands of rundata rows which will have identical > rundata_once rows and it makes no sense to repeat these rows when they can be > referenced via an integer index (might as well put them all in a single table). > But whether we submit in batch or not - it shouldn't matter, seems to me it > should recognize that it is about to create an identical rundata_once row and > just reference the matched row from the rundata row. While this would be possible to implement, I don't think the database server will have a problem with a table with some ten million rows (each maybe a kB of data), which is only a few GB of data... I can not put this high on my priority list, but if someone submits a patch incl. functionality to upgrade existing experiments, we could integrate it. If you don't want the database to grow so fast, maybe it will be a simpler approach to reduce the numbers of runs to be created by submitting multiple test outputs as a single run. As far as I understand your setup, this is something that perfbase supports today (more than one input file can be used to create a single run). Joachim -- Joachim Worringen, Software Architect, Dolphin Interconnect Solutions phone ++49/(0)228/324 08 17 - http://www.dolphinics.com --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscribe@... For additional commands, e-mail: users-help@... |
|
|
Re: rundata_once UNIQUE clauseJoachim Worringen wrote On 07/26/06 17:27,: > Ethan Mallove wrote: > >>Joachim Worringen wrote On 07/26/06 13:09,: >> >>>I don't see the requirement here. Each run has it's own row in >>>rundata_once, and it makes no sense messing around with this. >> >> >>I think the problem is that we currently consider every mpirun command a >>test_run, when we should send them in batch (thus, many mpirun's for each test >>run). Because we will have thousands of rundata rows which will have identical >>rundata_once rows and it makes no sense to repeat these rows when they can be >>referenced via an integer index (might as well put them all in a single table). >>But whether we submit in batch or not - it shouldn't matter, seems to me it >>should recognize that it is about to create an identical rundata_once row and >>just reference the matched row from the rundata row. > > > While this would be possible to implement, I don't think the database > server will have a problem with a table with some ten million rows (each > maybe a kB of data), which is only a few GB of data... I can not put > this high on my priority list, but if someone submits a patch incl. > functionality to upgrade existing experiments, we could integrate it. > > If you don't want the database to grow so fast, maybe it will be a > simpler approach to reduce the numbers of runs to be created by > submitting multiple test outputs as a single run. As far as I understand > your setup, this is something that perfbase supports today (more than > one input file can be used to create a single run). > Could we also use the <set_seperation> element? I think we would opt for this over multiple files since we input to perfbase over HTTP. > Joachim > --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscribe@... For additional commands, e-mail: users-help@... |
|
|
Re: rundata_once UNIQUE clauseEthan Mallove wrote:
> Could we also use the <set_seperation> element? I think we would opt for this > over multiple files since we input to perfbase over HTTP. This will create a new run, which you don't want. Joachim -- Joachim Worringen, Software Architect, Dolphin Interconnect Solutions phone ++49/(0)228/324 08 17 - http://www.dolphinics.com --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscribe@... For additional commands, e-mail: users-help@... |
|
|
Re: rundata_once UNIQUE clauseJoachim Worringen wrote On 07/27/06 03:12,:
> Ethan Mallove wrote: > >>Could we also use the <set_seperation> element? I think we would opt for this >>over multiple files since we input to perfbase over HTTP. > > > This will create a new run, which you don't want. > I don't suppose it would be easy to implement something like: <!ELEMENT set_separation (match|regexp)> <!ATTLIST set_separation runs (single|multiple) "multiple"> > Joachim > --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscribe@... For additional commands, e-mail: users-help@... |
|
|
Re: rundata_once UNIQUE clauseEthan Mallove wrote:
> Joachim Worringen wrote On 07/27/06 03:12,: >> Ethan Mallove wrote: >> >>> Could we also use the <set_seperation> element? I think we would opt for this >>> over multiple files since we input to perfbase over HTTP. >> >> This will create a new run, which you don't want. >> > > I don't suppose it would be easy to implement something like: > > <!ELEMENT set_separation (match|regexp)> > <!ATTLIST set_separation runs (single|multiple) "multiple"> To achieve what exactly? A 'set_separation' (confusion! should better be called 'run_separation') is used to create multiple runs from a single input file. I guess this is not what you want? Instead, you want to reduce the number of runs to be created for your given experiment output data? What you probably want is a way to create multiple data sets within one run, each covering i.e. a different test and its result within the same environment. To achieve this, define the required parameters and results, and use the "store_set" attribute of a <named_location> (or a <tabular_location>, which does this implicitely) to have perfbase store the current dataset, and start gathering data for the next one. Joachim -- Joachim Worringen, Software Architect, Dolphin Interconnect Solutions phone ++49/(0)228/324 08 17 - http://www.dolphinics.com --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscribe@... For additional commands, e-mail: users-help@... |
| Free embeddable forum powered by Nabble | Forum Help |