Announcing statistical inference package 'stats'

View: New views
3 Messages — Rating Filter:   Alert me  

Announcing statistical inference package 'stats'

by Mario Rodriguez :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Dear all,

I have just commited new package 'stats' to cvs, also help files in
English and Spanish were added, together with the test file.

This is the list (which should be increased in the future) of included
procedures:

* mean_test
* dif_means_test
* variance_test
* variance_ratio_test
* sign_test
* signed_rank_test
* rank_sum_test (or Wilcoxon-Mann-Whitney test)
* shapiro_wilk_test (to check for normality)
* simple_linear_reg

This package also defines the Maxima object 'inference_result', which
stores the results of all the computations, although only a subset of
them are displayed by default (just the most commonly needed). See the
help files to learn how to obtain the complete set of results.

I have been inspired in some extent by the R statistical package: the
idea of the 'inference_result' was taken from R, and some purely
numerical algorithms are the same used by R (translated from C or
fortran to lisp; in fact, some C code in R is a translation from
previous algorithms written in fortran).

I have not implemented the R 'frame' concept, but the door is open to
make it in the future. Samples can be stored in Maxima lists or matrices
and given as arguments to the functions of package 'stats'.

I'm not sure I made the correct decisions while writing this package;
I'm open to comments (included about the help file, since I'm not a good
English writer).

Tests passed in clisp, cmucl and sbcl.

Files are in
http://maxima.cvs.sourceforge.net/maxima/maxima/share/contrib/stats/

Best wishes

Mario

--
Mario Rodriguez Riotorto
www.biomates.net
_______________________________________________
Maxima mailing list
Maxima@...
http://www.math.utexas.edu/mailman/listinfo/maxima

Re: Announcing statistical inference package 'stats'

by Robert Dodier-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Mario,

Thanks a lot for writing the stats package.
I appreciate your dedication to the project.

As the major aspects of the package are very good,
I'll restrict myself to some minor quibbling.

(1) I don't think it's appropriate to modify global variables by
    loading the stats package. This can lead to bad surprises,
    e.g. interactions with other packages, or unexpected results.

(2) About numer in particular, numer : true defeats one of Maxima's
    major features. We really shouldn't discourage people from
    exploiting Maxima's capability to do exact integer and rational
    arithmetic.

    If some functions in the stats package need to convert non-floats
    to floats, then (LET (($NUMER T)) ..) or block([numer : true], ...)
    is the way to go.

(3) I think it is a good idea to present the results in an
    inference_result object. I like the way the results are presented
    in a nice format by a display function.

    A possibility here is to use the existing (though not quite
    finished) defstruct code to construct the inference_result objects.
    Then the methods for accessing fields within a structure don't
    need to be duplicated.

(4) I think the written documentation is very good; every share package
    should have such nice documentation. I'll make some minor revisions
    to the texinfo file.

(5) I recommend renaming shapiro_wilk_test --> test_normality and
    making shapiro_wilk an option (since there are other normality
    tests)

(6) I recommend renaming dif_means_test --> means_difference_test
    or means_diff_test

(7) I recommend renaming simple_linear_reg --> linear_regression
    or simple_linear_regression

(8) (MAYBE) Test functions could be renamed in big-endian style,
    to give these similar functions names which are more similar.
    It's a minor point.

    mean_test --> test_mean
    means_difference_test --> test_means_difference
    variance_test --> test_variance
    variance_ratio_test --> test_variance_ratio
    sign_test --> test_sign
    signed_rank_test --> test_signed_rank
    normality_test --> test_normality


Thanks again, & all the best,

Robert
_______________________________________________
Maxima mailing list
Maxima@...
http://www.math.utexas.edu/mailman/listinfo/maxima

Re: Announcing statistical inference package 'stats'

by Mario Rodriguez :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hello Robert,

> (1) I don't think it's appropriate to modify global variables by
>     loading the stats package. This can lead to bad surprises,
>     e.g. interactions with other packages, or unexpected results.
>
> (2) About numer in particular, numer : true defeats one of Maxima's
>     major features. We really shouldn't discourage people from
>     exploiting Maxima's capability to do exact integer and rational
>     arithmetic.
>
>     If some functions in the stats package need to convert non-floats
>     to floats, then (LET (($NUMER T)) ..) or block([numer : true], ...)
>     is the way to go.
>

I understand objection (1). But I think that a common user of this
package will be mostly interested in looking at floating point results;
if these are given in rational form, he's obliged to write '%,numer'
most of the time. On the other hand, nobody needs a p-value with sixteen
digits, that's why I restrict fpprintprec to 7. On the other hand, with
global variables 'numer' and 'fpprintprec' set to their default values
the displayed inference_result object is very ugly.

I propose a third alternative. Let's define two new global variables
'stats_numer' (default true) and 'stats_fpprint' (default 7), and don't
change the other two globally.

> (3) I think it is a good idea to present the results in an
>     inference_result object. I like the way the results are presented
>     in a nice format by a display function.

I like it too, but the original idea of porting this from R to Maxima is
not mine ;)

>     A possibility here is to use the existing (though not quite
>     finished) defstruct code to construct the inference_result objects.
>     Then the methods for accessing fields within a structure don't
>     need to be duplicated.

Not related with the stats package. Months ago, I have being studying
how to use 'defstruct' in the distrib package, to make it similar to the
Mathematica style of defining distributions; for example, the idea was
to write something similar to

cdf(1/2, normal_distribution(0,1));

instead of

cdf_normal(1/2,0,1);

but I wasn't sure about the benefits of this syntax, and gave up.


> (4) I think the written documentation is very good; every share package
>     should have such nice documentation. I'll make some minor revisions
>     to the texinfo file.

Please, make them.

> (5) I recommend renaming shapiro_wilk_test --> test_normality and
>     making shapiro_wilk an option (since there are other normality
>     tests)
>
> (6) I recommend renaming dif_means_test --> means_difference_test
>     or means_diff_test
>
> (7) I recommend renaming simple_linear_reg --> linear_regression
>     or simple_linear_regression
>
> (8) (MAYBE) Test functions could be renamed in big-endian style,
>     to give these similar functions names which are more similar.
>     It's a minor point.
>
>     mean_test --> test_mean
>     means_difference_test --> test_means_difference
>     variance_test --> test_variance
>     variance_ratio_test --> test_variance_ratio
>     sign_test --> test_sign
>     signed_rank_test --> test_signed_rank
>     normality_test --> test_normality


Ok, I'll put changes (5), (6), (7), and (8) in my todo list.

Thanks for your comments. I'm interested in reading these and other
opinions before writing more tests.

Mario

--
Mario Rodriguez Riotorto
www.biomates.net
_______________________________________________
Maxima mailing list
Maxima@...
http://www.math.utexas.edu/mailman/listinfo/maxima