|
View:
New views
5 Messages
—
Rating Filter:
Alert me
|
|
|
sorting two-dimensional data[resent because I hadn't confirmed by subscription yet when sending;
the original email is probably waiting for approval - sorry for the noise] Hello, I am trying get data displayed in 3D with chunk size and number of processes as the independent variables (as in the PMB 3D example), but with my dependent variable coming out of a division operation. My query is attached. The input data were several text files which contained the output of a normal IMB run without special options and followed by more IMB runs with different Intel Trace Collector configurations. Each file had one fixed processor count. The purpose is to present the slowdown caused by tracing; the result will be presented in a paper at the LCI conference in May. Because the text files were (intentionally) imported in increasing processor count order, the "raw_text" output already looks right: <output ... target="raw_text" dimensions="3" color="no"> <input label="title">src.baseline</input> </output> # S_chunk[byte] N_proc[process] T_avg[us] 4 4 1.84 8 4 1.85 ... 2097152 4 13257.39 4194304 4 30191.67 4 8 11.83 8 8 11.01 ... 2097152 256 26673.05 4194304 256 57646.34 However, trying with target="gnuplot" I only get the error message: #* ERROR: 'can not plot one-dimensional input data in 3D (missing sort-<operator>?)' apparently coming from this code: if self.ndims == 3 and (blank_cnt == 0 or blank_cnt == data_cnt - 1): # It makes no sense to plot one-dimensional datasets in 3D. The reason for this situation # will most likely be unsorted data. However, this is not a bullet-proof check: data that # is sorted in some (but not the right) way will not trigger this exception. raise DataError, "can not plot one-dimensional input data in 3D (missing sort-<operator>?)" What is the right way to sort the src.baseline or op.slowdown streams? I tried applying the sort operator as shown in the PMB/pmb_query_3d.xml example but that didn't change anything. Thanks for your advice, it is much needed ;-} -- Best Regards, Patrick Ohly The content of this message is my personal opinion only and although I am an employee of Intel, the statements I make here in no way represent Intel's position on the issue, nor am I authorized to speak on behalf of Intel on this matter. --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscribe@... For additional commands, e-mail: users-help@... |
|
|
Re: sorting two-dimensional dataPatrick Ohly wrote:
> What is the right way to sort the src.baseline or op.slowdown streams? I > tried applying the sort operator as shown in the PMB/pmb_query_3d.xml > example but that didn't change anything. Thanks for your advice, it is > much needed ;-} Hi Patrick, (actually responding you from the Intel site at Dupont, WA ;-)) I suspect the problem is that for 3D-data, gnuplot requires the data blocks (which come by having a 3D-matrix being presented in a 2D-file) be separated by blank lines. These blank lines need to be inserted by perfbase's gnuplot "driver", as the sources and operators do of course not care for this. For the driver to be able to insert these blank lines correctly, the data needs to be sorted by the first column (S_chunk) in your case. Thus, use 'value="S_chunk"' in your sort operator, and things should work fine. Joachim -- Joachim Worringen, Software Architect, Dolphin Interconnect Solutions phone ++49/(0)228/324 08 17 - http://www.dolphinics.com --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscribe@... For additional commands, e-mail: users-help@... |
|
|
Re: sorting two-dimensional dataOn Thu, 2007-03-01 at 15:17 -0800, Joachim Worringen wrote:
> Patrick Ohly wrote: > > What is the right way to sort the src.baseline or op.slowdown streams? I > > tried applying the sort operator as shown in the PMB/pmb_query_3d.xml > > example but that didn't change anything. Thanks for your advice, it is > > much needed ;-} > > Hi Patrick, > > (actually responding you from the Intel site at Dupont, WA ;-)) Then you are closer to the machine that I am benchmarking on than myself. What a small world ;-) > I > suspect the problem is that for 3D-data, gnuplot requires the data > blocks (which come by having a 3D-matrix being presented in a 2D-file) > be separated by blank lines. These blank lines need to be inserted by > perfbase's gnuplot "driver", as the sources and operators do of course > not care for this. > > For the driver to be able to insert these blank lines correctly, the > data needs to be sorted by the first column (S_chunk) in your case. > Thus, use 'value="S_chunk"' in your sort operator, and things should > work fine. That gets me around the error message, but apparently the sort is not stable and randomly rearranges tuples which have the same chunk size: 4 128 161.6 4 256 191.66 4 16 46.75 4 4 1.84 4 32 100.39 4 64 126.77 4 8 11.83 8 64 126.99 8 256 225.55 8 128 159.29 8 8 11.01 8 16 46.48 8 32 100.37 8 4 1.85 16 4 1.85 16 16 46.84 16 32 100.26 16 256 201.84 16 8 11.55 16 64 127.03 16 128 159.34 32 128 159.29 32 64 126.89 32 16 46.74 32 8 11.3 32 256 196.72 32 4 1.97 32 32 100.21 ... This causes gnuplot to draw lines between (4,128,161.6) and (16,4,1.85), etc. (or so it seems - the 3d plot is too mixed up to tell for sure). I suspect that what is needed is a sort operator with more than one key. Because my data is already sorted the right way, I can work around this by exchanging my x- and y-axises without relying on a sort operator - but only for the original source. The result of the div operator does not seem to preserve the ordering and I get an error regardless how I arrange my axises. That would have been only a kludge anyway, let me see whether I can get further by enhancing the sort operator. -- Best Regards, Patrick Ohly The content of this message is my personal opinion only and although I am an employee of Intel, the statements I make here in no way represent Intel's position on the issue, nor am I authorized to speak on behalf of Intel on this matter. --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscribe@... For additional commands, e-mail: users-help@... |
|
|
Re: sorting two-dimensional data - patchesOn Fri, 2007-03-02 at 07:46 +0000, Patrick Ohly wrote:
> This causes gnuplot to draw lines between (4,128,161.6) and > (16,4,1.85), > etc. (or so it seems - the 3d plot is too mixed up to tell for sure). > I > suspect that what is needed is a sort operator with more than one key. > > Because my data is already sorted the right way, I can work around > this > by exchanging my x- and y-axises without relying on a sort operator - > but only for the original source. The result of the div operator does > not seem to preserve the ordering and I get an error regardless how I > arrange my axises. That would have been only a kludge anyway, let me > see > whether I can get further by enhancing the sort operator. clause, which accepts more than one key. It was only the value validation which had to be patched - see multi-order.patch. Note that I am not quite certain when the self.result_infos array is used in the lines that I modified; it always seemed to be empty in my tests. Please check carefully whether I broke anything... In that patch I have also updated the comments in the dtd. While I was at it I added a comment about how parameters are chosen for the independent axises. This was something that I struggled with when I got started with perfbase; to my knowledge it wasn't mentioned explicitly anywhere. I hope I got it right, please correct me if I didn't. Also attached is a patch (cut.patch) to get the postgres startup working on RH AS2.1. I did not investigate in detail, but the wc -l $PBLOG | cut -d " " -f 1 only produced an empty result there; I found it easier to avoid the file name in the wc output right away like this cat $PBLOG | wc -l Thanks for this excellent tool. I figure I have spent about as much time learning how it works as I would have for writing my own analysis scripts, but the output is much nicer and querying more flexible with perfbase. Besides, it will be simpler the next time :-) -- Best Regards, Patrick Ohly The content of this message is my personal opinion only and although I am an employee of Intel, the statements I make here in no way represent Intel's position on the issue, nor am I authorized to speak on behalf of Intel on this matter. [cut.patch] *** bin/perfbase 2007-02-22 14:54:34.970987000 +0100 --- /home/pohly/bin/perfbase 2007-03-01 10:06:53.989312000 +0100 *************** *** 23,28 **** --- 23,30 ---- # Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA # + PATH=/Projects/software/IA32-LIN/gnuplot-4.0.0/bin:$PATH + # These settings will to be adapted for the individual installation by the setup script. PB_HOME=/home/pohly/bin PG_CMD_PATH=/Projects/software/IA32-LIN/postgresql-8.2.3/bin *************** *** 30,36 **** PYTHON=/Projects/software/IA32-LIN/python/bin/python PIDOF=/sbin/pidof ! DB_HOST=/var/run/postgresql DB_PORT=5432 DB_USER=pohly --- 32,38 ---- PYTHON=/Projects/software/IA32-LIN/python/bin/python PIDOF=/sbin/pidof ! DB_HOST=/tmp/ DB_PORT=5432 DB_USER=pohly *************** *** 228,249 **** # Try to make sure that we look at the most recent log output. # This is not bullet-proof! Better idea anyone? touch $PBLOG ! n_lines=`wc -l $PBLOG | cut -d " " -f 1` $PG_CMD_PATH/postmaster $pm_port -i -D $PG_DATA_PATH >>$PBLOG 2>&1 & sleep 5 ! while [ `wc -l $PBLOG | cut -d " " -f 1` -le $n_lines ] ; do # No lines have yet been appended, wait some more time. sleep 5 done # Some lines have been appended. Now, we wait until nothing has been # appended for some time. ! n_lines=`wc -l $PBLOG | cut -d " " -f 1` sleep 5 ! while [ `wc -l $PBLOG | cut -d " " -f 1` -gt $n_lines ] ; do sleep 5 ! n_lines=`wc -l $PBLOG | cut -d " " -f 1` done pm_state=`tail -2 $PBLOG | grep "database system is ready"` --- 230,251 ---- # Try to make sure that we look at the most recent log output. # This is not bullet-proof! Better idea anyone? touch $PBLOG ! n_lines=`cat $PBLOG | wc -l` $PG_CMD_PATH/postmaster $pm_port -i -D $PG_DATA_PATH >>$PBLOG 2>&1 & sleep 5 ! while [ `cat $PBLOG | wc -l` -le $n_lines ] ; do # No lines have yet been appended, wait some more time. sleep 5 done # Some lines have been appended. Now, we wait until nothing has been # appended for some time. ! n_lines=`cat $PBLOG | wc -l` sleep 5 ! while [ `cat $PBLOG | wc -l` -gt $n_lines ] ; do sleep 5 ! n_lines=`cat $PBLOG | wc -l` done pm_state=`tail -2 $PBLOG | grep "database system is ready"` [multi-sort.patch] *** dtd/pb_query.dtd.orig 2007-03-02 09:49:10.646707000 +0100 --- dtd/pb_query.dtd 2007-03-02 10:02:24.313915000 +0100 *************** *** 69,74 **** --- 69,79 ---- yaxis (top|bottom|auto) "bottom" > <!ELEMENT output (input+,option*,filename?,endian?,intsize?,floatsize?,stringsep?,tics*)> + <!-- dimensions chooses between 2d and 3d plot; + for 3d plots the input data must be sorted at least + on the first column and usually also on the second to + generate a grid - the sort operator with two keys in the value + attribute can be used for this --> <!ATTLIST output target (raw_binary|raw_text|netcdf|hdf5|gnuplot|grace|latex|xml) "raw_text" sweep_combine (match|alltoall) "match" type (graphs|points|boxes|bars|steps) "graphs" *************** *** 153,158 **** --- 158,165 ---- <!-- id: unique id for reference from other objects --> <!-- value: numerical or string argument (as required by the --> <!-- specific operator, i.e. scale factor) --> + <!-- sort - a comma or white-space separated list of one --> + <!-- or more keys to sort by --> <!-- option: option for the operator --> <!-- match: determine how datasets from different sources are --> <!-- matched to apply an operation on them. I.e., if --> *************** *** 233,239 **** <!ELEMENT parameter (value,sweep?,filter*)> <!-- 'boolean' defines how multiple filter are combined. --> <!-- 'show' defines which elements of a parameter value will show --> ! <!-- up in the output of the query. --> <!-- 'style' further defines how data from a filter shows up in --> <!-- the output. For a value "v" with filter content "c", "full" --> <!-- prints everything ('v = c');"content" prints only the --> --- 240,248 ---- <!ELEMENT parameter (value,sweep?,filter*)> <!-- 'boolean' defines how multiple filter are combined. --> <!-- 'show' defines which elements of a parameter value will show --> ! <!-- up in the output of the query; those set to "auto" are used --> ! <!-- as the varying values on the independent axis(es) of 2d and --> ! <!-- 3d plots --> <!-- 'style' further defines how data from a filter shows up in --> <!-- the output. For a value "v" with filter content "c", "full" --> <!-- prints everything ('v = c');"content" prints only the --> *** bin/pb_operators.py 2006-09-13 14:32:26.000000000 +0200 --- /home/pohly/bin/pb_operators.py 2007-03-02 09:48:36.838740000 +0100 *************** *** 2604,2623 **** self.sort_by = self.param_infos[0][0] att = mk_label(elmnt_tree.get('value'), all_nodes) if att != None: ! # find value by which the sorting should be done self.sort_by = None ! for ri in self.result_infos: ! if ri[0] == att: ! self.sort_by = att ! break ! if self.sort_by is None: ! for pi in self.param_infos: ! if pi[0] == att: ! self.sort_by = att break ! if self.sort_by is None: ! raise SpecificationError, "<operator> '%s': 'value' attribute '%s' is an unknown value name" \ ! % (self.name, att) self.is_sql_op = True self.can_reduce = False --- 2604,2629 ---- self.sort_by = self.param_infos[0][0] att = mk_label(elmnt_tree.get('value'), all_nodes) if att != None: ! # find value(s) by which the sorting should be done ! good = [] ! bad = [] self.sort_by = None ! # treat comma like another white space separator so ! # that "foo,bar", "foo, bar" and "foo bar" are all ! # valid ! for atom in att.replace(',', ' ').split(): ! for ri in self.result_infos + self.param_infos: ! if ri[0] == atom: ! good.append(atom) break ! else: ! bad.append(atom) ! if bad: ! raise SpecificationError, "<operator> '%s': 'value' attribute '%s' contains unknown value name(s) '%s'" \ ! % (self.name, att, " ".join(bad)) ! else: ! # comma separated for SQL query ! self.sort_by = ",".join(good) self.is_sql_op = True self.can_reduce = False --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscribe@... For additional commands, e-mail: users-help@... |
|
|
Re: sorting two-dimensional data - patchesPatrick Ohly wrote:
> Okay, that wasn't too hard. The sorting is done in a SQL ORDER BY > clause, which accepts more than one key. It was only the value > validation which had to be patched - see multi-order.patch. Note that I > am not quite certain when the self.result_infos array is used in the > lines that I modified; it always seemed to be empty in my tests. Please > check carefully whether I broke anything... > > In that patch I have also updated the comments in the dtd. While I was > at it I added a comment about how parameters are chosen for the > independent axises. This was something that I struggled with when I got > started with perfbase; to my knowledge it wasn't mentioned explicitly > anywhere. I hope I got it right, please correct me if I didn't. > > Also attached is a patch (cut.patch) to get the postgres startup working > on RH AS2.1. I did not investigate in detail, but the > wc -l $PBLOG | cut -d " " -f 1 > only produced an empty result there; I found it easier to avoid the file > name in the wc output right away like this > cat $PBLOG | wc -l Thanks for the patches. I will add them ASAP. Currently am very busy doing benchmarks myself - but I might need this feature as well... > Thanks for this excellent tool. I figure I have spent about as much time > learning how it works as I would have for writing my own analysis > scripts, but the output is much nicer and querying more flexible with > perfbase. Besides, it will be simpler the next time :-) Admitted, the learning curve is somewhat steep. But once the data is in the database and you understood the concept of the queries, you can retrieve much more information from the data than just plotting what the benchmark spits out. Your query is a good example for this. Plus, you create a long-term overview of performance results. Joachim -- Joachim Worringen, Software Architect, Dolphin Interconnect Solutions phone ++49/(0)228/324 08 17 - http://www.dolphinics.com --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscribe@... For additional commands, e-mail: users-help@... |
| Free embeddable forum powered by Nabble | Forum Help |