Problem and solution: Patch for new "stats" command

View: New views
20 Messages — Rating Filter:   Alert me  
< Prev | 1 - 2 | Next >

Problem and solution: Patch for new "stats" command

by Philipp K. Janert :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


Here is a recurrent problem I often have:

Frequently, I would like to plot some data after
subtracting off the mean. Or I would like to
normalize the variation by dividing with the
standard deviation. Or I'd like to form a normalized
histogram, by passing the number of records in
the data set to "smooth frequency".

Currently, I always have to invoke an external
program to find these quantities, which is always
a little inconvenient in interactive use, and pretty
painful when scripting gnuplot. When not on Unix,
it may be quite difficult!

For the last few weeks,  Zoltán and I have been
working on a command that calculates the most
important such quantities from a data file, displays
them, and (optionally) assigns them to variables in
the current gnuplot session.

You can see some examples of what you can do
with this command here:
        http://www.phyast.pitt.edu/~zov1/gnuplot/patch/stats.html
and you can read the full documentation here:
        http://www.phyast.pitt.edu/~zov1/gnuplot/patch/stats_help.html

I uploaded a patch with our changes to sourceforge.

We'd like to hear feedback and suggestions. Is this
useful? Are we missing anything?

We'd also like to encourage everyone to build the
patch and play with it - each additional user finds
a new class of bugs!

Best,

                Ph.

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
gnuplot-beta mailing list
gnuplot-beta@...
https://lists.sourceforge.net/lists/listinfo/gnuplot-beta

valid/invalid points [was: Patch for new "stats" command]

by Ethan Merritt :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Sunday 08 November 2009, Philipp K. Janert wrote:
> We'd like to hear feedback and suggestions. Is this
> useful? Are we missing anything?

- I don't know how to interpret this behaviour:
gnuplot> stats '-' using (1)
input data ('e' ends) > 1  
input data ('e' ends) > 2  
input data ('e' ends) > NaN
input data ('e' ends) > 4
input data ('e' ends) > 5

* FILE:
  Records:      4
  Invalid:      0
  Blank:        0
  Data Blocks:  1

Why is the NaN not listed as either "invalid" or "blank"?
Same thing happens for '?' or Inf or junk.
I don't think this is being tracked correctly.

NB: We exchanged Email off-list about making df_readline() more consistent
about returning DF_UNDEFINED, DF_MISSING, and so on.  Totally aside from
keeping statistics, does anyone object to making
      plot 'foo'
      plot 'foo' using 1:2
      plot 'foo' using ($1):($2)
all consistently return DF_MISSING and DF_UNDEFINED?
Right now all three behave differently.
     

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
gnuplot-beta mailing list
gnuplot-beta@...
https://lists.sourceforge.net/lists/listinfo/gnuplot-beta

Function stats [was: Patch for new "stats" command]

by Ethan Merritt :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

 On Sunday 08 November 2009, Philipp K. Janert wrote:
> We'd like to hear feedback and suggestions. Is this
> useful? Are we missing anything?
 
Some first thoughts:
 
The behaviour for functions is not as obvious as for files of
data points.  For example:
 
 gnuplot> set xrange [0:10]
 gnuplot> stats '+' using 1:(sin($1))
 
 * FILE:
   Records:      100
 
 * COLUMNS:
   Mean:          5.0000              0.1792
   Minimum:       0.0000 [  1]       -0.9994 [ 48]
   Quartile:      2.5253 [ 26]       -0.3837 [ 36]
   Median:        5.0505 [ 51]        0.3082 [ 29]
   Quartile:      7.5758 [ 76]        0.8075 [ 85]
   Maximum:      10.0000 [100]        0.9997 [ 79]
 
 gnuplot> set samples 1000
 gnuplot> stats '+' using 1:(sin($1))
 
 * FILE:
   Records:      1000
 
 * COLUMNS:
   Mean:          5.0000               0.1835
   Minimum:       0.0000 [   1]       -1.0000 [ 472]
   Quartile:      2.5025 [ 251]       -0.3941 [ 983]
   Median:        5.0050 [ 501]        0.3149 [  33]
   Quartile:      7.5075 [ 751]        0.8113 [ 848]
   Maximum:      10.0000 [1000]        1.0000 [ 158]
 
I find several things disconcerting about this output, although
I know what the underlying causes are.
 
 - The min/max are artifacts of the sampling.
   They're not even symmetric even though sin(x) is a symmetric function
   You can reduce the problem by increasing the number of samples, but I
   think more drastic alternatives should be considered

   1) The 'stats' command could refuse to operate on functions
   2) The 'stats' command could temporarily bump up the sampling rate
      by 100x
   3) The 'stats' command could do a systematic search in the area of
      the nominal extrema to determine more accurate values.
      Even so, if the sampling is too coarse it may miss a true extremum
      that lies elsewhere.

 - The "mean" of a periodic function would normally be calculated over
   one period of the function rather than an arbitrary range.
   Yeah, I know, I gave an explicit xrange.  But still...
 
 - The quantities in [] are documented as "the" point at which the
   min/max/whatever occurs.  But there is no expectation for either data
   or functions that the minimum, for example, is only acheived only at a
   single point.   I don't think it makes any sense to give these values
   unless the data or function is monotonic.  And given sampling artifacts,
   it probably makes no sense to give them for function data at all.

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
gnuplot-beta mailing list
gnuplot-beta@...
https://lists.sourceforge.net/lists/listinfo/gnuplot-beta

Re: Function stats [was: Patch for new "stats" command]

by Philipp K. Janert :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Sunday 08 November 2009 09:14:16 pm Ethan Merritt wrote:
>  On Sunday 08 November 2009, Philipp K. Janert wrote:
> > We'd like to hear feedback and suggestions. Is this
> > useful? Are we missing anything?
>
> Some first thoughts:

Thanks for checking it out!

>
> The behaviour for functions is not as obvious as for files of
> data points.  For example:

I am not sure. I find this a little unfair. There is
no claim that the stats command does function
minimization. It finds the extrema in the data sets
passed to it. And that it does correctly, I think.
(Even in the example given below.)

Let me state it again: the stats command works
on data sets. Not functions (in the analytic sense).
I don't think it would be reasonable to expect
anything else.

Regarding "the" min/max : you are right, the documentation
could be clearer. If there are multiple points in a data set,
all of which are of the same (minimal) value, then the stats
command currently makes no guarantee for which of those
points it will report the position in the file. It will just report
the position of one of them.

>
>  gnuplot> set xrange [0:10]
>  gnuplot> stats '+' using 1:(sin($1))
>
>  * FILE:
>    Records:      100
>
>  * COLUMNS:
>    Mean:          5.0000              0.1792
>    Minimum:       0.0000 [  1]       -0.9994 [ 48]
>    Quartile:      2.5253 [ 26]       -0.3837 [ 36]
>    Median:        5.0505 [ 51]        0.3082 [ 29]
>    Quartile:      7.5758 [ 76]        0.8075 [ 85]
>    Maximum:      10.0000 [100]        0.9997 [ 79]
>
>  gnuplot> set samples 1000
>  gnuplot> stats '+' using 1:(sin($1))
>
>  * FILE:
>    Records:      1000
>
>  * COLUMNS:
>    Mean:          5.0000               0.1835
>    Minimum:       0.0000 [   1]       -1.0000 [ 472]
>    Quartile:      2.5025 [ 251]       -0.3941 [ 983]
>    Median:        5.0050 [ 501]        0.3149 [  33]
>    Quartile:      7.5075 [ 751]        0.8113 [ 848]
>    Maximum:      10.0000 [1000]        1.0000 [ 158]
>
> I find several things disconcerting about this output, although
> I know what the underlying causes are.
>
>  - The min/max are artifacts of the sampling.
>    They're not even symmetric even though sin(x) is a symmetric function
>    You can reduce the problem by increasing the number of samples, but I
>    think more drastic alternatives should be considered
>
>    1) The 'stats' command could refuse to operate on functions
>    2) The 'stats' command could temporarily bump up the sampling rate
>       by 100x
>    3) The 'stats' command could do a systematic search in the area of
>       the nominal extrema to determine more accurate values.
>       Even so, if the sampling is too coarse it may miss a true extremum
>       that lies elsewhere.
>
>  - The "mean" of a periodic function would normally be calculated over
>    one period of the function rather than an arbitrary range.
>    Yeah, I know, I gave an explicit xrange.  But still...
>
>  - The quantities in [] are documented as "the" point at which the
>    min/max/whatever occurs.  But there is no expectation for either data
>    or functions that the minimum, for example, is only acheived only at a
>    single point.   I don't think it makes any sense to give these values
>    unless the data or function is monotonic.  And given sampling artifacts,
>    it probably makes no sense to give them for function data at all.



------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
gnuplot-beta mailing list
gnuplot-beta@...
https://lists.sourceforge.net/lists/listinfo/gnuplot-beta

Re: valid/invalid points [was: Patch for new "stats" command]

by Philipp K. Janert :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Sunday 08 November 2009 09:11:58 pm Ethan Merritt wrote:
> On Sunday 08 November 2009, Philipp K. Janert wrote:
> > We'd like to hear feedback and suggestions. Is this
> > useful? Are we missing anything?
>

I think this is due to the "using (1)" - it should be ($1).
When I do this with "using ($1)", the stats command
reports on the invalid entry as it should.


> - I don't know how to interpret this behaviour:
> gnuplot> stats '-' using (1)
> input data ('e' ends) > 1
> input data ('e' ends) > 2
> input data ('e' ends) > NaN
> input data ('e' ends) > 4
> input data ('e' ends) > 5
>
> * FILE:
>   Records:      4
>   Invalid:      0
>   Blank:        0
>   Data Blocks:  1
>
> Why is the NaN not listed as either "invalid" or "blank"?
> Same thing happens for '?' or Inf or junk.
> I don't think this is being tracked correctly.
>
> NB: We exchanged Email off-list about making df_readline() more consistent
> about returning DF_UNDEFINED, DF_MISSING, and so on.  Totally aside from
> keeping statistics, does anyone object to making
>       plot 'foo'
>       plot 'foo' using 1:2
>       plot 'foo' using ($1):($2)
> all consistently return DF_MISSING and DF_UNDEFINED?
> Right now all three behave differently.

I completely agree. I would be great if readline
was behaving consistently. But we did not feel
confident to make changes to a routine that is
so central.



------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
gnuplot-beta mailing list
gnuplot-beta@...
https://lists.sourceforge.net/lists/listinfo/gnuplot-beta

Re: Problem and solution: Patch for new "stats" command

by Tait :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

>         http://www.phyast.pitt.edu/~zov1/gnuplot/patch/stats.html

Forgive me for not actually looking at the code (yet), but a couple
questions sprang immediately to mind.

Mean as used here seems to be the arithmetic mean. What about the geometric
mean? (Or harmonic mean, or any of the other types of averages?)

Is the standard deviation calculated as for a population, or a sample? Is
there a way for a user to request the other?

I have always addressed these sorts of issues by using a tool that's
designed for manipulating arbitrary data sets (Perl in my case, but there
are others). This has the advantage of providing infinite flexibility,
subroutines, complex logical conditions, modules or libraries, abstraction
and re-use, and all those things that gnuplot doesn't have. The external
tool then generates output that's fed to gnuplot.

I wonder, rather than providing a restricted set of pre-defined functions,
is there a way to allow the user to provide a formula or expression that
will be applied across multiple rows? Then the user could calculate the
mean (whatever that means to their application) or standard deviation or
some other arbitrary metric on their own.

Tait


------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
gnuplot-beta mailing list
gnuplot-beta@...
https://lists.sourceforge.net/lists/listinfo/gnuplot-beta

Re: Problem and solution: Patch for new "stats" command

by Philipp K. Janert :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


Based on some of the comments regarding our
suggested "stats" command, I think I need to
clarify something.

There was a question whether the stats command
works on "a sample or a population". Neither!
It works on a a DATA FILE, like the rest of gnuplot.
(And therefore there is no ambiguity.)

Gnuplot knows nothing about sampling or statistical
methods - it is a plotting tool. And this command is
intended as an addition to gnuplot's plotting capabilities
by giving you some useful information about the file that
you are plotting from.

Best,

                Ph.



------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
gnuplot-beta mailing list
gnuplot-beta@...
https://lists.sourceforge.net/lists/listinfo/gnuplot-beta

Re: Problem and solution: Patch for new "stats" command

by Philipp K. Janert :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


[snip]
>
> I have always addressed these sorts of issues by using a tool that's
> designed for manipulating arbitrary data sets (Perl in my case, but there
> are others). This has the advantage of providing infinite flexibility,
> subroutines, complex logical conditions, modules or libraries, abstraction
> and re-use, and all those things that gnuplot doesn't have. The external
> tool then generates output that's fed to gnuplot.

So do I, and I have found that there is a small set
of properties I calculate much more often than
others. Those are pretty much those included in
out stats command. The idea here is to provide a
convenience for the 80% case. If I want to do something
"fancy", or just anything that is special or unique to
one particular data set, I think it is much more
appropriate to hack that up as an external Perl
script.

>
> I wonder, rather than providing a restricted set of pre-defined functions,
> is there a way to allow the user to provide a formula or expression that
> will be applied across multiple rows? Then the user could calculate the
> mean (whatever that means to their application) or standard deviation or
> some other arbitrary metric on their own.

That is an interesting suggestion, but obviously
goes in scope way beyond what we were attempting
here.

And I am not sure that I would want to see that in
gnuplot. Gnuplot's strength is that it is JUST a
plotting tool. Because of that, it is simple and
straightforward (no need to deal with a command
language).

If you want statistical functions with general programming
capabilities AND graphics, use R. ;-)

>
> Tait
>
>
> ---------------------------------------------------------------------------
>--- Let Crystal Reports handle the reporting - Free Crystal Reports 2008
> 30-Day trial. Simplify your report design, integration and deployment - and
> focus on what you do best, core application coding. Discover what's new
> with Crystal Reports now.  http://p.sf.net/sfu/bobj-july
> _______________________________________________
> gnuplot-beta mailing list
> gnuplot-beta@...
> https://lists.sourceforge.net/lists/listinfo/gnuplot-beta



------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
gnuplot-beta mailing list
gnuplot-beta@...
https://lists.sourceforge.net/lists/listinfo/gnuplot-beta

Re: Problem and solution: Patch for new "stats" command

by Jonathan Thornburg-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Mon, 9 Nov 2009, Philipp K. Janert wrote:
> There was a question whether the stats command
> works on "a sample or a population". Neither!
> It works on a a DATA FILE, like the rest of gnuplot.
> (And therefore there is no ambiguity.)

I'm sorry, but I still don't know whether this means it uses N or N-1
weighting.  I would find it useful to have *both* printed out -- each is
valuable in different circumstances.

--
-- "Jonathan Thornburg [remove -animal to reply]" <jthorn@...>
   Dept of Astronomy, Indiana University, Bloomington, Indiana, USA
   "C++ is to programming as sex is to reproduction. Better ways might
    technically exist but they're not nearly as much fun." -- Nikolai Irgens

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
gnuplot-beta mailing list
gnuplot-beta@...
https://lists.sourceforge.net/lists/listinfo/gnuplot-beta

Re: Problem and solution: Patch for new "stats" command

by Philipp K. Janert :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Monday 09 November 2009 07:29:10 am Jonathan Thornburg wrote:
> On Mon, 9 Nov 2009, Philipp K. Janert wrote:
> > There was a question whether the stats command
> > works on "a sample or a population". Neither!
> > It works on a a DATA FILE, like the rest of gnuplot.
> > (And therefore there is no ambiguity.)
>
> I'm sorry, but I still don't know whether this means it uses N or N-1
> weighting.  I would find it useful to have *both* printed out -- each is
> valuable in different circumstances.

I see. That makes more sense.

Currently, the stats command divides by N.


------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
gnuplot-beta mailing list
gnuplot-beta@...
https://lists.sourceforge.net/lists/listinfo/gnuplot-beta

Re: Problem and solution: Patch for new "stats" command

by ZoltánVörös :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Philipp K. Janert <janert <at> ieee.org> writes:

>
> On Monday 09 November 2009 07:29:10 am Jonathan Thornburg wrote:
> > On Mon, 9 Nov 2009, Philipp K. Janert wrote:
> > > There was a question whether the stats command
> > > works on "a sample or a population". Neither!
> > > It works on a a DATA FILE, like the rest of gnuplot.
> > > (And therefore there is no ambiguity.)
> >
> > I'm sorry, but I still don't know whether this means it uses N or N-1
> > weighting.  I would find it useful to have *both* printed out -- each is
> > valuable in different circumstances.
>
> I see. That makes more sense.
>
> Currently, the stats command divides by N.

I don't really see the difference: since the sum of whatever quantity you want
is reported, as is the number of records, you can define either sum_x / records
or sum_x / (records-1). Similar argument applies to the standard deviations and
the like. I don't want to add new variables for quantities that can easily be
calculated from existing ones.
Best,
Zoltán


------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
gnuplot-beta mailing list
gnuplot-beta@...
https://lists.sourceforge.net/lists/listinfo/gnuplot-beta

Re: Problem and solution: Patch for new "stats" command

by ZoltánVörös :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Tait <gnuplot-devel <at> t41t.com> writes:

>
> Mean as used here seems to be the arithmetic mean. What about the geometric
> mean? (Or harmonic mean, or any of the other types of averages?)

Harmonic mean is easily done with the present stats command as
stats 'foo' u 1:(1.0/$2) noout var
harmonic = records / sum_y

> I have always addressed these sorts of issues by using a tool that's
> designed for manipulating arbitrary data sets (Perl in my case, but there
> are others). This has the advantage of providing infinite flexibility,

That is sort of a platform-dependent solution for a problem that comes up too
often, but Philipp has already discussed this issue at length, I believe.

> I wonder, rather than providing a restricted set of pre-defined functions,
> is there a way to allow the user to provide a formula or expression that
> will be applied across multiple rows? Then the user could calculate the
> mean (whatever that means to their application) or standard deviation or
> some other arbitrary metric on their own.

stats 'foo' u ($2*$3+cos($4)) should work as it is, if that is what you meant. A
fairly large set of quantities can be calculated using the variables that are
produced by stats, if the proper function is applied to the columns beforehand.
Best,
Zoltán


------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
gnuplot-beta mailing list
gnuplot-beta@...
https://lists.sourceforge.net/lists/listinfo/gnuplot-beta

Re: Problem and solution: Patch for new "stats" command

by Ethan Merritt :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Sunday 08 November 2009 12:17:50 Philipp K. Janert wrote:

> For the last few weeks,  Zoltán and I have been
> working on a command that calculates the most
> important such quantities from a data file, displays
> them, and (optionally) assigns them to variables in
> the current gnuplot session.
>
> You can see some examples of what you can do
> with this command here:
>         http://www.phyast.pitt.edu/~zov1/gnuplot/patch/stats.html
> and you can read the full documentation here:
>         http://www.phyast.pitt.edu/~zov1/gnuplot/patch/stats_help.html
>
> I uploaded a patch with our changes to sourceforge.
>
> We'd like to hear feedback and suggestions.

A nice starting point for development.

> Is this useful? Are we missing anything?

I had occasion to use your patch in the real world yesterday.
I was making figures from data stored in a csv file, with 16 columns
of data each corresponding to one experiment.  I wanted to normalize
the curves so that each one filled the range [0:1] when superimposed.

set datafile separator ','
data = "My Data File"

stats data using 5 variable="col5"
stats data using 6 variable="col6"
...
stats data using 20 variable="col20"

plot ....

All great.
But having to type 16 separate commands and then hunt for the
results to construct a single complicated plotting command was tedious.
So I have several specific suggestions for improvement.

1) The syntax    
    stats data using 20 variable="col20"
did not do at all what I expected.  I expected to get variables
    col20_max_x  and so on,
but what I actually got was variables with embedded quote marks:
  "col20"max_x
Trying to embed this in a plot command was a pain.
 
I suggest that the syntax should be
    stats data using <foo> name <string>
and should produce variables named
    GPSTAT_string_max_x  
    GPSTAT_string_npoints  (NB: not ...nrecords)
Except that I really don't like this mechanism very much.

For one thing, we don't currently have any easy way to undefine all
these variables.  "show var GPSTAT" would show all of them, but
"undefine GPSTAT" doesn't get rid of them.  That's fixable, I suppose,
but I don't like the variables for another reason...

2)  Here's the command I want to issue in the end:

   plot for [col=5:20] data using (column(4)) :  (column(col) / statmax(col))  \
        title label(col)

I doesn't work, because I can't figure out how to define a function statmax(col)
that retrieves the desired value.  We don't have a user-level command in gnuplot
that will retrieve the value of a gnuplot variable by its string name.
Yesterday I had to forego the iterator and type in a 16-line plot command instead.

So I want to request a different mechanism for storing and retrieving the
stats values.  You've seen this before, but here it comes again:
I don't want dozens of variables to be created by every stats command,
because they are too hard to retrieve inside a script.  Instead I want each
stats command to load a structure, and I want a set of functions that retrieve
the previously calculated stats values, indexed by name.  If you want to load
a named variable from one of the stats values, fine.  Just say
   Run5_xmin = statmin("Run5")
That will persist across a save/load sequence, for instance, even though the
internal stats structures will not.

3)  For convenience, the stats command should accept an iterator.
My plots yesterday could then have been created in two commands:

   stats for [col=5:20] data using col name "Run".col
   plot  for [col=5:20] data using (column(4)) : \
         (column(col) / statmax("Run".col)) \
         title sprintf("Run%d",col)

Note that "Run".col  is the same as sprintf("Run%d",col).

4) I suggest adding a mechanism for explicitly clearing out the set of
stats calculations.  One obvious syntax is
    reset stats



--
Ethan A Merritt
Biomolecular Structure Center
University of Washington, Seattle 98195-7742

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
gnuplot-beta mailing list
gnuplot-beta@...
https://lists.sourceforge.net/lists/listinfo/gnuplot-beta

unset var [was Re: Problem and solution: Patch for new "stats" command]

by Hans-Bernhard Bröker-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Ethan Merritt wrote:
[...]

> For one thing, we don't currently have any easy way to undefine all
> these variables.  "show var GPSTAT" would show all of them, but
> "undefine GPSTAT" doesn't get rid of them.  That's fixable, I suppose,
> but I don't like the variables for another reason...

> 4) I suggest adding a mechanism for explicitly clearing out the set of
> stats calculations.  One obvious syntax is
>     reset stats

This one deserves generalization.  Unless I missed something, we
currently don't have a command to get rid of any variable (short of
'exit' and starting a new session), whereas the number of variables we
have gnuplot create by itself seems to be increasing all the time (first
"fit" results, then GPVAL_*, now possibly GPSTAT_*).

I think a generic

        unset variable {<name>| pattern <regex> | fit | stats}

command would be in order.  I would prefer 'unset' over 'reset' here
because it matches 'show variables' a bit better than 'reset'.  It
should probably be extended to user-defined functions, too.  And maybe
we should even allow

        set variable <var>=<expr>
and
        set function <name>(<arguments>)=<expression>

as an optional syntax instead of the usual <var>=<expr> etc., too.


------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
gnuplot-beta mailing list
gnuplot-beta@...
https://lists.sourceforge.net/lists/listinfo/gnuplot-beta

arrays [was Re: Problem and solution: Patch for new "stats" command]

by Hans-Bernhard Bröker-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Ethan Merritt wrote:
> 2)  Here's the command I want to issue in the end:
>
>    plot for [col=5:20] data using (column(4)) :  (column(col) / statmax(col))  \
>         title label(col)
>
> I doesn't work, because I can't figure out how to define a function statmax(col)
> that retrieves the desired value.  We don't have a user-level command in gnuplot
> that will retrieve the value of a gnuplot variable by its string name.

Maybe more to the point, we lack array variables, which are exactly what
this really would call for.  Collections you loop over to retrieve
individual elements by numbers are just that: arrays.  Whether they be
emulated by fancy variable name construction from fragments, or a
function taking the index as an argument, they're still just arrays.

OTOH, arrays (a.k.a. vectors, and eventually matrices) would be one more
step towards mimicking MatLab.  Which we used to say we weren't going to do.

Maybe all those quirks popping up are to warn us that this is not a
direction we should continue walking in.

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
gnuplot-beta mailing list
gnuplot-beta@...
https://lists.sourceforge.net/lists/listinfo/gnuplot-beta

Re: unset var [was Re: Problem and solution: Patch for new "stats" command]

by Ethan Merritt :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Tuesday 10 November 2009 13:00:16 Hans-Bernhard Bröker wrote:

> This one deserves generalization.  Unless I missed something, we
> currently don't have a command to get rid of any variable (short of
> 'exit' and starting a new session),

Yeah, we do.  You can
   undefine VARNAME
But it doesn't take wildcards.

Perhaps it should do the same as the "show var PREFIX" command,
and treat the string as a leading prefix to the full variable name.

> whereas the number of variables we  
> have gnuplot create by itself seems to be increasing all the time (first
> "fit" results, then GPVAL_*, now possibly GPSTAT_*).
>
> I think a generic
>
> unset variable {<name>| pattern <regex> | fit | stats}
>
> command would be in order.  I would prefer 'unset' over 'reset' here
> because it matches 'show variables' a bit better than 'reset'.  

Hmm.  But it isn't  "set var foo",  it's "var = foo".
So "unset" is misleading.

> It should probably be extended to user-defined functions, too.

Yes.  Good point.

But there is a possible "gotcha".  If you undefine a function that is
called by another previously-defined function, bad things could happen.


> And maybe  
> we should even allow
>
> set variable <var>=<expr>
> and
> set function <name>(<arguments>)=<expression>
>
> as an optional syntax instead of the usual <var>=<expr> etc., too.

Yes, that would be the other way to justify use of "unset" :-)
But it seems a more drastic change than extending "undefine".

--
Ethan A Merritt

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
gnuplot-beta mailing list
gnuplot-beta@...
https://lists.sourceforge.net/lists/listinfo/gnuplot-beta

Re: unset var [was Re: Problem and solution: Patch for new "stats" command]

by Hans-Bernhard Bröker-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Ethan Merritt wrote:

> Hmm.  But it isn't  "set var foo",  it's "var = foo".
> So "unset" is misleading.

But we _do_ "show var foo", so "{set|unset} var foo" would match the
usual relation between show/set/unset rather more nicely than a new
command of its own.

> But there is a possible "gotcha".  If you undefine a function that is
> called by another previously-defined function, bad things could happen.

None of those would be worse than the bad things already happening by

*) never defining a variable used by some function in the first place
*) never defining a function used by anther function in the first place
*) undefining a variable that was referenced by some function
*) never defining a variable used to set the value of another variable
*) never defining a function called to set the value of a variable



------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
gnuplot-beta mailing list
gnuplot-beta@...
https://lists.sourceforge.net/lists/listinfo/gnuplot-beta

Re: Problem and solution: Patch for new "stats" command

by Philipp K. Janert :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


I was out all day so I am joining this thread a little later.

>
> I had occasion to use your patch in the real world yesterday.
> I was making figures from data stored in a csv file, with 16 columns
> of data each corresponding to one experiment.  I wanted to normalize
> the curves so that each one filled the range [0:1] when superimposed.

Thanks for giving it a "real-world whirl".

>
> set datafile separator ','
> data = "My Data File"
>
> stats data using 5 variable="col5"
> stats data using 6 variable="col6"
> ...
> stats data using 20 variable="col20"
>
> plot ....
>
> All great.
> But having to type 16 separate commands and then hunt for the
> results to construct a single complicated plotting command was tedious.
> So I have several specific suggestions for improvement.
>
> 1) The syntax
>     stats data using 20 variable="col20"
> did not do at all what I expected.  I expected to get variables
>     col20_max_x  and so on,
> but what I actually got was variables with embedded quote marks:
>   "col20"max_x
> Trying to embed this in a plot command was a pain.

The quotes are your addition. We don't expect them,
but we don't actively remove them. Maybe we should -
I admit that there is a user expectation that "there should
be quotes". (I know, because I found myself adding them.)
On the other hand, I don't want to have a command that is
too smart (removing quotes silently, but leaving everything
else alone...)

>
> I suggest that the syntax should be
>     stats data using <foo> name <string>
> and should produce variables named
>     GPSTAT_string_max_x
>     GPSTAT_string_npoints  (NB: not ...nrecords)
> Except that I really don't like this mechanism very much.

Zortan and I discussed this. Our idea was to keep the
variable names short and user-friendly. The GPSTAT_
prefix really does not help very much. If users want to
avoid polluting their own namespace, we give them the
ability to choose their own prefixes.

>
> For one thing, we don't currently have any easy way to undefine all
> these variables.  "show var GPSTAT" would show all of them, but
> "undefine GPSTAT" doesn't get rid of them.  That's fixable, I suppose,
> but I don't like the variables for another reason...
>
> 2)  Here's the command I want to issue in the end:
>
>    plot for [col=5:20] data using (column(4)) :  (column(col) /
> statmax(col))  \ title label(col)
>
> I doesn't work, because I can't figure out how to define a function
> statmax(col) that retrieves the desired value.  We don't have a user-level
> command in gnuplot that will retrieve the value of a gnuplot variable by
> its string name. Yesterday I had to forego the iterator and type in a
> 16-line plot command instead.

If I see this correctly, mostly you would like to add
support for iteration into the stats command? This
is certainly something we can think about.

>
> So I want to request a different mechanism for storing and retrieving the
> stats values.  You've seen this before, but here it comes again:
> I don't want dozens of variables to be created by every stats command,
> because they are too hard to retrieve inside a script.  Instead I want each
> stats command to load a structure, and I want a set of functions that
> retrieve the previously calculated stats values, indexed by name.  If you
> want to load a named variable from one of the stats values, fine.  Just say
>    Run5_xmin = statmin("Run5")
> That will persist across a save/load sequence, for instance, even though
> the internal stats structures will not.

Why is that goodness? I don't understand the
motivation here. Why do you want to go the
roundabout way (and force the user through
this detour) of accessing variables through
functions, rather than as variables?

The "prefix" that we offer for variable names
serves exactly the same purpose as the data
structure that you refer to: a logical grouping.

>
> 3)  For convenience, the stats command should accept an iterator.
> My plots yesterday could then have been created in two commands:
>
>    stats for [col=5:20] data using col name "Run".col
>    plot  for [col=5:20] data using (column(4)) : \
>          (column(col) / statmax("Run".col)) \
>          title sprintf("Run%d",col)
>
> Note that "Run".col  is the same as sprintf("Run%d",col).
>
> 4) I suggest adding a mechanism for explicitly clearing out the set of
> stats calculations.  One obvious syntax is
>     reset stats

Yes, and in fact Zoltan and I have implemented a
function to do that, but not hooked it up.


------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
gnuplot-beta mailing list
gnuplot-beta@...
https://lists.sourceforge.net/lists/listinfo/gnuplot-beta

Re: unset var [was Re: Problem and solution: Patch for new "stats" command]

by Philipp K. Janert :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Tuesday 10 November 2009 01:00:16 pm Hans-Bernhard Bröker wrote:

> Ethan Merritt wrote:
> [...]
>
> > For one thing, we don't currently have any easy way to undefine all
> > these variables.  "show var GPSTAT" would show all of them, but
> > "undefine GPSTAT" doesn't get rid of them.  That's fixable, I suppose,
> > but I don't like the variables for another reason...
> >
> > 4) I suggest adding a mechanism for explicitly clearing out the set of
> > stats calculations.  One obvious syntax is
> >     reset stats
>
> This one deserves generalization.  Unless I missed something, we
> currently don't have a command to get rid of any variable (short of
> 'exit' and starting a new session), whereas the number of variables we
> have gnuplot create by itself seems to be increasing all the time (first
> "fit" results, then GPVAL_*, now possibly GPSTAT_*).
>
> I think a generic
>
> unset variable {<name>| pattern <regex> | fit | stats}
>

I fully agree. Zoltan and I discussed this and even
implemented a function that will do that. We just
don't currently expose it as a command - partially
because we did not want to add yet another user-level
command.

But the idea of overloading unset for this purpose is
a good one.

> command would be in order.  I would prefer 'unset' over 'reset' here
> because it matches 'show variables' a bit better than 'reset'.  It
> should probably be extended to user-defined functions, too.  And maybe
> we should even allow
>
> set variable <var>=<expr>
> and
> set function <name>(<arguments>)=<expression>
>
> as an optional syntax instead of the usual <var>=<expr> etc., too.
>
>
> ---------------------------------------------------------------------------
>--- Let Crystal Reports handle the reporting - Free Crystal Reports 2008
> 30-Day trial. Simplify your report design, integration and deployment - and
> focus on what you do best, core application coding. Discover what's new
> with Crystal Reports now.  http://p.sf.net/sfu/bobj-july
> _______________________________________________
> gnuplot-beta mailing list
> gnuplot-beta@...
> https://lists.sourceforge.net/lists/listinfo/gnuplot-beta



------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
gnuplot-beta mailing list
gnuplot-beta@...
https://lists.sourceforge.net/lists/listinfo/gnuplot-beta

Re: unset var [was Re: Problem and solution: Patch for new "stats" command]

by Philipp K. Janert :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Tuesday 10 November 2009 01:32:06 pm Ethan Merritt wrote:
> On Tuesday 10 November 2009 13:00:16 Hans-Bernhard Bröker wrote:
> > This one deserves generalization.  Unless I missed something, we
> > currently don't have a command to get rid of any variable (short of
> > 'exit' and starting a new session),
>
> Yeah, we do.  You can
>    undefine VARNAME
> But it doesn't take wildcards.

Side question: is that behavior documented
somewhere? I did not know about it, either.

>
> Perhaps it should do the same as the "show var PREFIX" command,
> and treat the string as a leading prefix to the full variable name.

That's what we were thinking. (I will admit that
I am reluctant to implement a regular expression
parser. It seems like overkill.)

I tend to prefer "unset" over "reset" - simply
because I am already used to "unset" taking
arguments, whereas "reset" does not.

>
> > whereas the number of variables we
> > have gnuplot create by itself seems to be increasing all the time (first
> > "fit" results, then GPVAL_*, now possibly GPSTAT_*).
> >
> > I think a generic
> >
> > unset variable {<name>| pattern <regex> | fit | stats}
> >
> > command would be in order.  I would prefer 'unset' over 'reset' here
> > because it matches 'show variables' a bit better than 'reset'.
>
> Hmm.  But it isn't  "set var foo",  it's "var = foo".
> So "unset" is misleading.
>
> > It should probably be extended to user-defined functions, too.
>
> Yes.  Good point.
>
> But there is a possible "gotcha".  If you undefine a function that is
> called by another previously-defined function, bad things could happen.
>
> > And maybe
> > we should even allow
> >
> > set variable <var>=<expr>
> > and
> > set function <name>(<arguments>)=<expression>
> >
> > as an optional syntax instead of the usual <var>=<expr> etc., too.
>
> Yes, that would be the other way to justify use of "unset" :-)
> But it seems a more drastic change than extending "undefine".



------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
gnuplot-beta mailing list
gnuplot-beta@...
https://lists.sourceforge.net/lists/listinfo/gnuplot-beta
< Prev | 1 - 2 | Next >