sorting two-dimensional data

View: New views
5 Messages — Rating Filter:   Alert me  

sorting two-dimensional data

by Patrick Ohly-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

[resent because I hadn't confirmed by subscription yet when sending;
 the original email is probably waiting for approval - sorry for the noise]

Hello,

I am trying get data displayed in 3D with chunk size and number of
processes as the independent variables (as in the PMB 3D example), but
with my dependent variable coming out of a division operation. My query
is attached.

The input data were several text files which contained the output of a
normal IMB run without special options and followed by more IMB runs
with different Intel Trace Collector configurations. Each file had one
fixed processor count. The purpose is to present the slowdown caused by
tracing; the result will be presented in a paper at the LCI conference
in May.

Because the text files were (intentionally) imported in increasing
processor count order, the "raw_text" output already looks right:

  <output ...
          target="raw_text" dimensions="3" color="no">
    <input label="title">src.baseline</input>
  </output>

# S_chunk[byte] N_proc[process] T_avg[us]
4       4       1.84
8       4       1.85
...
2097152         4       13257.39
4194304         4       30191.67
4       8       11.83
8       8       11.01
...
2097152         256     26673.05
4194304         256     57646.34

However, trying with target="gnuplot" I only get the error message:
        #* ERROR: 'can not plot one-dimensional input data in 3D (missing sort-<operator>?)'

apparently coming from this code:
            if self.ndims == 3 and (blank_cnt == 0 or blank_cnt == data_cnt - 1):
                # It makes no sense to plot one-dimensional datasets in 3D. The reason for this situation
                # will most likely be unsorted data. However, this is not a bullet-proof check: data that
                # is sorted in some (but not the right) way will not trigger this exception.
                raise DataError, "can not plot one-dimensional input data in 3D (missing sort-<operator>?)"

What is the right way to sort the src.baseline or op.slowdown streams? I
tried applying the sort operator as shown in the PMB/pmb_query_3d.xml
example but that didn't change anything. Thanks for your advice, it is
much needed ;-}

--
Best Regards, Patrick Ohly

The content of this message is my personal opinion only and although
I am an employee of Intel, the statements I make here in no way
represent Intel's position on the issue, nor am I authorized to speak
on behalf of Intel on this matter.



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@...
For additional commands, e-mail: users-help@...

imb_scaling_3d_query.xml (2K) Download Attachment

Re: sorting two-dimensional data

by Joachim Worringen-4 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Patrick Ohly wrote:
> What is the right way to sort the src.baseline or op.slowdown streams? I
> tried applying the sort operator as shown in the PMB/pmb_query_3d.xml
> example but that didn't change anything. Thanks for your advice, it is
> much needed ;-}

Hi Patrick,

(actually responding you from the Intel site at Dupont, WA ;-)) I
suspect the problem is that for 3D-data, gnuplot requires the data
blocks (which come by having a 3D-matrix being presented in a 2D-file)
be separated by blank lines. These blank lines need to be inserted by
perfbase's gnuplot "driver", as the sources and operators do of course
not care for this.

For the driver to be able to insert these blank lines correctly, the
data needs to be sorted by the first column (S_chunk) in your case.
Thus, use 'value="S_chunk"' in your sort operator, and things should
work fine.

  Joachim

--
Joachim Worringen, Software Architect, Dolphin Interconnect Solutions
phone ++49/(0)228/324 08 17 - http://www.dolphinics.com

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@...
For additional commands, e-mail: users-help@...


Re: sorting two-dimensional data

by Patrick Ohly-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Thu, 2007-03-01 at 15:17 -0800, Joachim Worringen wrote:
> Patrick Ohly wrote:
> > What is the right way to sort the src.baseline or op.slowdown streams? I
> > tried applying the sort operator as shown in the PMB/pmb_query_3d.xml
> > example but that didn't change anything. Thanks for your advice, it is
> > much needed ;-}
>
> Hi Patrick,
>
> (actually responding you from the Intel site at Dupont, WA ;-))

Then you are closer to the machine that I am benchmarking on than
myself. What a small world ;-)

>  I
> suspect the problem is that for 3D-data, gnuplot requires the data
> blocks (which come by having a 3D-matrix being presented in a 2D-file)
> be separated by blank lines. These blank lines need to be inserted by
> perfbase's gnuplot "driver", as the sources and operators do of course
> not care for this.
>
> For the driver to be able to insert these blank lines correctly, the
> data needs to be sorted by the first column (S_chunk) in your case.
> Thus, use 'value="S_chunk"' in your sort operator, and things should
> work fine.

That gets me around the error message, but apparently the sort is not
stable and randomly rearranges tuples which have the same chunk size:

4       128     161.6
4       256     191.66
4       16      46.75
4       4       1.84
4       32      100.39
4       64      126.77
4       8       11.83
8       64      126.99
8       256     225.55
8       128     159.29
8       8       11.01
8       16      46.48
8       32      100.37
8       4       1.85
16      4       1.85
16      16      46.84
16      32      100.26
16      256     201.84
16      8       11.55
16      64      127.03
16      128     159.34
32      128     159.29
32      64      126.89
32      16      46.74
32      8       11.3
32      256     196.72
32      4       1.97
32      32      100.21
...

This causes gnuplot to draw lines between (4,128,161.6) and (16,4,1.85),
etc. (or so it seems - the 3d plot is too mixed up to tell for sure). I
suspect that what is needed is a sort operator with more than one key.

Because my data is already sorted the right way, I can work around this
by exchanging my x- and y-axises without relying on a sort operator -
but only for the original source. The result of the div operator does
not seem to preserve the ordering and I get an error regardless how I
arrange my axises. That would have been only a kludge anyway, let me see
whether I can get further by enhancing the sort operator.

--
Best Regards, Patrick Ohly

The content of this message is my personal opinion only and although
I am an employee of Intel, the statements I make here in no way
represent Intel's position on the issue, nor am I authorized to speak
on behalf of Intel on this matter.

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@...
For additional commands, e-mail: users-help@...


Re: sorting two-dimensional data - patches

by Patrick Ohly-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Fri, 2007-03-02 at 07:46 +0000, Patrick Ohly wrote:

> This causes gnuplot to draw lines between (4,128,161.6) and
> (16,4,1.85),
> etc. (or so it seems - the 3d plot is too mixed up to tell for sure).
> I
> suspect that what is needed is a sort operator with more than one key.
>
> Because my data is already sorted the right way, I can work around
> this
> by exchanging my x- and y-axises without relying on a sort operator -
> but only for the original source. The result of the div operator does
> not seem to preserve the ordering and I get an error regardless how I
> arrange my axises. That would have been only a kludge anyway, let me
> see
> whether I can get further by enhancing the sort operator.
Okay, that wasn't too hard. The sorting is done in a SQL ORDER BY
clause, which accepts more than one key. It was only the value
validation which had to be patched - see multi-order.patch. Note that I
am not quite certain when the self.result_infos array is used in the
lines that I modified; it always seemed to be empty in my tests. Please
check carefully whether I broke anything...

In that patch I have also updated the comments in the dtd. While I was
at it I added a comment about how parameters are chosen for the
independent axises. This was something that I struggled with when I got
started with perfbase; to my knowledge it wasn't mentioned explicitly
anywhere. I hope I got it right, please correct me if I didn't.

Also attached is a patch (cut.patch) to get the postgres startup working
on RH AS2.1. I did not investigate in detail, but the
   wc -l $PBLOG | cut -d " " -f 1
only produced an empty result there; I found it easier to avoid the file
name in the wc output right away like this
   cat $PBLOG | wc -l

Thanks for this excellent tool. I figure I have spent about as much time
learning how it works as I would have for writing my own analysis
scripts, but the output is much nicer and querying more flexible with
perfbase. Besides, it will be simpler the next time :-)

--
Best Regards, Patrick Ohly

The content of this message is my personal opinion only and although
I am an employee of Intel, the statements I make here in no way
represent Intel's position on the issue, nor am I authorized to speak
on behalf of Intel on this matter.


[cut.patch]

*** bin/perfbase 2007-02-22 14:54:34.970987000 +0100
--- /home/pohly/bin/perfbase 2007-03-01 10:06:53.989312000 +0100
***************
*** 23,28 ****
--- 23,30 ----
  #     Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
  #
 
+ PATH=/Projects/software/IA32-LIN/gnuplot-4.0.0/bin:$PATH
+
  # These settings will to be adapted for the individual installation by the setup script.
  PB_HOME=/home/pohly/bin
  PG_CMD_PATH=/Projects/software/IA32-LIN/postgresql-8.2.3/bin
***************
*** 30,36 ****
  PYTHON=/Projects/software/IA32-LIN/python/bin/python
  PIDOF=/sbin/pidof
 
! DB_HOST=/var/run/postgresql
  DB_PORT=5432
  DB_USER=pohly
 
--- 32,38 ----
  PYTHON=/Projects/software/IA32-LIN/python/bin/python
  PIDOF=/sbin/pidof
 
! DB_HOST=/tmp/
  DB_PORT=5432
  DB_USER=pohly
 
***************
*** 228,249 ****
  # Try to make sure that we look at the most recent log output.
  # This is not bullet-proof! Better idea anyone?
  touch $PBLOG
! n_lines=`wc -l $PBLOG | cut -d " " -f 1`
 
  $PG_CMD_PATH/postmaster $pm_port -i -D $PG_DATA_PATH >>$PBLOG 2>&1 &
 
  sleep 5
! while [ `wc -l $PBLOG | cut -d " " -f 1` -le $n_lines ] ; do
     # No lines have yet been appended, wait some more time.
     sleep 5
  done
  # Some lines have been appended. Now, we wait until nothing has been
  # appended for some time.
! n_lines=`wc -l $PBLOG | cut -d " " -f 1`
  sleep 5
! while [ `wc -l $PBLOG | cut -d " " -f 1` -gt $n_lines ] ; do
     sleep 5
!    n_lines=`wc -l $PBLOG | cut -d " " -f 1`
  done
 
  pm_state=`tail -2 $PBLOG | grep "database system is ready"`
--- 230,251 ----
  # Try to make sure that we look at the most recent log output.
  # This is not bullet-proof! Better idea anyone?
  touch $PBLOG
! n_lines=`cat $PBLOG | wc -l`
 
  $PG_CMD_PATH/postmaster $pm_port -i -D $PG_DATA_PATH >>$PBLOG 2>&1 &
 
  sleep 5
! while [ `cat $PBLOG | wc -l` -le $n_lines ] ; do
     # No lines have yet been appended, wait some more time.
     sleep 5
  done
  # Some lines have been appended. Now, we wait until nothing has been
  # appended for some time.
! n_lines=`cat $PBLOG | wc -l`
  sleep 5
! while [ `cat $PBLOG | wc -l` -gt $n_lines ] ; do
     sleep 5
!    n_lines=`cat $PBLOG | wc -l`
  done
 
  pm_state=`tail -2 $PBLOG | grep "database system is ready"`


[multi-sort.patch]

*** dtd/pb_query.dtd.orig 2007-03-02 09:49:10.646707000 +0100
--- dtd/pb_query.dtd 2007-03-02 10:02:24.313915000 +0100
***************
*** 69,74 ****
--- 69,79 ----
               yaxis      (top|bottom|auto)                  "bottom" >
 
  <!ELEMENT output      (input+,option*,filename?,endian?,intsize?,floatsize?,stringsep?,tics*)>
+ <!--         dimensions chooses between 2d and 3d plot;
+                         for 3d plots the input data must be sorted at least
+                         on the first column and usually also on the second to
+                         generate a grid - the sort operator with two keys in the value
+                         attribute can be used for this -->
  <!ATTLIST output      target       (raw_binary|raw_text|netcdf|hdf5|gnuplot|grace|latex|xml) "raw_text"
                        sweep_combine (match|alltoall)                                     "match"
                        type         (graphs|points|boxes|bars|steps)                      "graphs"
***************
*** 153,158 ****
--- 158,165 ----
  <!-- id:       unique id for reference from other objects           -->
  <!-- value:    numerical or string argument (as required by the     -->
  <!--           specific operator, i.e. scale factor)                -->
+ <!--           sort - a comma or white-space separated list of one  -->
+ <!--                  or more keys to sort by                       -->
  <!-- option:   option for the operator                              -->
  <!-- match:    determine how datasets from different sources are    -->
  <!--           matched to apply an operation on them. I.e., if      -->
***************
*** 233,239 ****
  <!ELEMENT parameter   (value,sweep?,filter*)>
  <!--                  'boolean' defines how multiple filter are combined.          -->
  <!--                  'show' defines which elements of a parameter value will show -->
! <!--                  up in the output of the query.                               -->
  <!--                  'style' further defines how data from a filter shows up in   -->
  <!--                  the output. For a value "v" with filter content "c", "full"  -->
  <!--                  prints everything ('v = c');"content" prints only the        -->
--- 240,248 ----
  <!ELEMENT parameter   (value,sweep?,filter*)>
  <!--                  'boolean' defines how multiple filter are combined.          -->
  <!--                  'show' defines which elements of a parameter value will show -->
! <!--                  up in the output of the query; those set to "auto" are used  -->
! <!--                  as the varying values on the independent axis(es) of 2d and  -->
! <!--                  3d plots                                                     -->
  <!--                  'style' further defines how data from a filter shows up in   -->
  <!--                  the output. For a value "v" with filter content "c", "full"  -->
  <!--                  prints everything ('v = c');"content" prints only the        -->
*** bin/pb_operators.py 2006-09-13 14:32:26.000000000 +0200
--- /home/pohly/bin/pb_operators.py 2007-03-02 09:48:36.838740000 +0100
***************
*** 2604,2623 ****
          self.sort_by = self.param_infos[0][0]
          att = mk_label(elmnt_tree.get('value'), all_nodes)
          if att != None:
!             # find value by which the sorting should be done
              self.sort_by = None
!             for ri in self.result_infos:
!                 if ri[0] == att:
!                     self.sort_by = att
!                     break
!             if self.sort_by is None:
!                 for pi in self.param_infos:
!                     if pi[0] == att:
!                         self.sort_by = att
                          break
!             if self.sort_by is None:
!                 raise SpecificationError, "<operator> '%s': 'value' attribute '%s' is an unknown value name" \
!                       % (self.name, att)
 
          self.is_sql_op = True
          self.can_reduce = False  
--- 2604,2629 ----
          self.sort_by = self.param_infos[0][0]
          att = mk_label(elmnt_tree.get('value'), all_nodes)
          if att != None:
!             # find value(s) by which the sorting should be done
!             good = []
!             bad = []
              self.sort_by = None
!             # treat comma like another white space separator so
!             # that "foo,bar", "foo, bar" and "foo bar" are all
!             # valid
!             for atom in att.replace(',', ' ').split():
!                 for ri in self.result_infos + self.param_infos:
!                     if ri[0] == atom:
!                         good.append(atom)
                          break
!                 else:
!                     bad.append(atom)
!             if bad:
!                 raise SpecificationError, "<operator> '%s': 'value' attribute '%s' contains unknown value name(s) '%s'" \
!                       % (self.name, att, " ".join(bad))
!             else:
!                 # comma separated for SQL query
!                 self.sort_by = ",".join(good)
 
          self.is_sql_op = True
          self.can_reduce = False  



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@...
For additional commands, e-mail: users-help@...

Re: sorting two-dimensional data - patches

by Joachim Worringen-4 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Patrick Ohly wrote:

> Okay, that wasn't too hard. The sorting is done in a SQL ORDER BY
> clause, which accepts more than one key. It was only the value
> validation which had to be patched - see multi-order.patch. Note that I
> am not quite certain when the self.result_infos array is used in the
> lines that I modified; it always seemed to be empty in my tests. Please
> check carefully whether I broke anything...
>
> In that patch I have also updated the comments in the dtd. While I was
> at it I added a comment about how parameters are chosen for the
> independent axises. This was something that I struggled with when I got
> started with perfbase; to my knowledge it wasn't mentioned explicitly
> anywhere. I hope I got it right, please correct me if I didn't.
>
> Also attached is a patch (cut.patch) to get the postgres startup working
> on RH AS2.1. I did not investigate in detail, but the
>    wc -l $PBLOG | cut -d " " -f 1
> only produced an empty result there; I found it easier to avoid the file
> name in the wc output right away like this
>    cat $PBLOG | wc -l

Thanks for the patches. I will add them ASAP. Currently am very busy
doing benchmarks myself - but I might need this feature as well...

> Thanks for this excellent tool. I figure I have spent about as much time
> learning how it works as I would have for writing my own analysis
> scripts, but the output is much nicer and querying more flexible with
> perfbase. Besides, it will be simpler the next time :-)

Admitted, the learning curve is somewhat steep. But once the data is in
the database and you understood the concept of the queries, you can
retrieve  much more information from the data than just plotting what
the benchmark spits out. Your query is a good example for this. Plus,
you create a long-term overview of performance results.

  Joachim

--
Joachim Worringen, Software Architect, Dolphin Interconnect Solutions
phone ++49/(0)228/324 08 17 - http://www.dolphinics.com

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@...
For additional commands, e-mail: users-help@...