|
View:
New views
13 Messages
—
Rating Filter:
Alert me
|
|
|
Covariance MatrixI've been trying to hack up a reliable CORRELATIONS command.
It seemed logical to me to use the existing covariance-matrix.c in src/math. Whilst this worked for simple examples, it gave completely wrong answers (or at least different from the spss examples I could find) when presented with either a) missing values; or b) non-unity caseweights. I've started digging into covariance-matrix.c in order to fix the problems, but the exercise is slowly evolving into a complete rewrite of the module, and in doing so, I'm not confident that I'm not breaking some of the functionality which correlations doesn't use (notably interactions). Of course, we could just go with 2 implementations of covariance-matrix - one which works with interactions, but not with missing values, and the other works with missing values, but doesn't support interactions, but I don't think this is a sensible way to proceed. As things stand right now, I don't think there's anything in master which actually uses interactions, but I don't know how far Jason has got with GLM. So I guess the question is what is the best way to proceed? As I see it, the options are: a) Fork the covariance matrix implementation and have 2 different ones in master - not a good idea I think. b) Get a working CM implementation which properly handles missing values and case-weights. When this is working, we can add support for interactions later. c) Get the CM working properly with all features, including weights, missing values and interactions, before proceeding further. My opinion is that we should go for (b), but that's largely motivated by personal interests, and I don't know how it's going to affect other developers. Comments ? -- PGP Public key ID: 1024D/2DE827B3 fingerprint = 8797 A26D 0854 2EAB 0285 A290 8A67 719C 2DE8 27B3 See http://pgp.mit.edu or any PGP keyserver for public key. _______________________________________________ pspp-dev mailing list pspp-dev@... http://lists.gnu.org/mailman/listinfo/pspp-dev |
|
|
Re: Covariance MatrixOn Sun, Sep 27, 2009 at 02:44:35PM +0800, John Darrington wrote:
> I've been trying to hack up a reliable CORRELATIONS command. > It seemed logical to me to use the existing covariance-matrix.c in > src/math. Whilst this worked for simple examples, it gave completely > wrong answers (or at least different from the spss examples I could > find) when presented with either a) missing values; or b) non-unity > caseweights. This is something I've been meaning to fix, but haven't gotten to it. > I've started digging into covariance-matrix.c in order to fix the > problems, but the exercise is slowly evolving into a complete rewrite > of the module, and in doing so, I'm not confident that I'm not > breaking some of the functionality which correlations doesn't use > (notably interactions). > > Of course, we could just go with 2 implementations of covariance-matrix > - one which works with interactions, but not with missing values, and > the other works with missing values, but doesn't support interactions, > but I don't think this is a sensible way to proceed. > > As things stand right now, I don't think there's anything in master which > actually uses interactions, but I don't know how far Jason has got with > GLM. Still working on it, though slowly. > So I guess the question is what is the best way to proceed? As I see it, > the options are: > > a) Fork the covariance matrix implementation and have 2 different ones > in master - not a good idea I think. > > b) Get a working CM implementation which properly handles missing values > and case-weights. When this is working, we can add support for interactions > later. > > c) Get the CM working properly with all features, including weights, missing > values and interactions, before proceeding further. > > > My opinion is that we should go for (b), but that's largely motivated by > personal interests, and I don't know how it's going to affect other > developers. > > Comments ? I don't want to drop support for interactions entirely, but I think the way it's done now is ugly and cumbersome. But it wouldn't be so ugly if done in two data passes, instead of one. So how about this: Start working (b), but with an algorithm that works with two data passes. I'll add the interactions to the two-pass algorithm later. In fact, the easiest way to handle interactions might be to use a simple two-pass algorithm inside another module written specifically for computing covariances with interactions. That would keep the actual computation of the covariance cleaner, and push elsewhere the ugly computations involving the interactions. So, the way I'm thinking now is that we should have something like this: struct interaction_covariance { ...blah blah struct covariance_matrix; }; instead of this (what we have now): struct covariance_matrix { ...blah blah const struct interaction_variable **interactions; }; Then the code in covariance-matrix.c doesn't need to know anything about interactions. But it will have to know about categorical variables. For a two-pass algorithm, I don't think that will be a big deal. Much of the current ugliness in covariance-matrix.c is due to its single data pass. I thought the benefit of one data pass would offset the ugliness. But one data pass isn't as stable as two, and the ugly hash tables and the convoluted code make the current covariance-matrix.c a wreck. So I would say: Rewrite away. If possible, please keep support for categorical variables. But as I said, I don't think that will be hard to add later for a two-pass algorithm. Ugly hashing will be unnecessary. I wish I hadn't been so stubborn about having a one-pass algorithm in the first place. _______________________________________________ pspp-dev mailing list pspp-dev@... http://lists.gnu.org/mailman/listinfo/pspp-dev |
|
|
Re: Covariance MatrixGiven that I like solution b, how should we proceed? Do you want to
erase most or all of covariance-matrix.c? Should I do it, or avoid it? Either way is OK with me. I think one of the first thing to do is get rid of any hashing business. No more interactions, too. I ask because I was the one who wrote all the crap in covariance-matrix.c in the first place, so on one hand, I kind of feel responsible, but on the other, writing crap doesn't exactly enhance one's confidence or motivation. _______________________________________________ pspp-dev mailing list pspp-dev@... http://lists.gnu.org/mailman/listinfo/pspp-dev |
|
|
Re: Covariance MatrixI'm happy to do it, but I'm not sure when I'll be able to find an
uninterrupted period of time which it requires. So I suggest that if you have other development tasks to do, then do those, otherwise feel free to rewrite covariance matrix yourself. I'm not adverse to using hash tables in covariance matrix if they are needed, but I don't think they are necessary for the simple case (without interactions/categorical variables). J' On Mon, Sep 28, 2009 at 08:10:58PM -0400, Jason Stover wrote: Given that I like solution b, how should we proceed? Do you want to erase most or all of covariance-matrix.c? Should I do it, or avoid it? Either way is OK with me. I think one of the first thing to do is get rid of any hashing business. No more interactions, too. I ask because I was the one who wrote all the crap in covariance-matrix.c in the first place, so on one hand, I kind of feel responsible, but on the other, writing crap doesn't exactly enhance one's confidence or motivation. _______________________________________________ pspp-dev mailing list pspp-dev@... http://lists.gnu.org/mailman/listinfo/pspp-dev -- PGP Public key ID: 1024D/2DE827B3 fingerprint = 8797 A26D 0854 2EAB 0285 A290 8A67 719C 2DE8 27B3 See http://pgp.mit.edu or any PGP keyserver for public key. _______________________________________________ pspp-dev mailing list pspp-dev@... http://lists.gnu.org/mailman/listinfo/pspp-dev |
|
|
Re: Covariance MatrixI've pushed an implementation of CORRELATIONS, along with a new
covariance matrix implementation in src/math/covariance.[ch] It actually calculates more than the covariance matrix: it also calculates the 0th 1st and 2nd moments, since these are necessary steps to obtain the cov. It's currently implemented as a single pass algorithm, but it will be straightforward to change that. There's no categorical variables, or interactions, at present. I'm fairly confident that it's giving the right values and works good with both missing values and with non-unity caseweights. To test the former, I tried the examples at http://www.ats.ucla.edu/stat/Spss/modules/missing.htm and compared my results with those published. For the latter, I have a self-consistency test in tests/command/correlation.sh I'd be interested in any comments and suggestions on how to proceed with generalising the implementation to accept categorical variables. Once that has been done, I think it would be a good idea to re-implement T-TEST and ONEWAY using the new module - which I hope would vanquish several thousand lines of rather messy code. J' On Mon, Sep 28, 2009 at 08:10:58PM -0400, Jason Stover wrote: Given that I like solution b, how should we proceed? Do you want to erase most or all of covariance-matrix.c? Should I do it, or avoid it? Either way is OK with me. I think one of the first thing to do is get rid of any hashing business. No more interactions, too. I ask because I was the one who wrote all the crap in covariance-matrix.c in the first place, so on one hand, I kind of feel responsible, but on the other, writing crap doesn't exactly enhance one's confidence or motivation. -- PGP Public key ID: 1024D/2DE827B3 fingerprint = 8797 A26D 0854 2EAB 0285 A290 8A67 719C 2DE8 27B3 See http://pgp.mit.edu or any PGP keyserver for public key. _______________________________________________ pspp-dev mailing list pspp-dev@... http://lists.gnu.org/mailman/listinfo/pspp-dev |
|
|
Re: Covariance MatrixOn Sun, Oct 04, 2009 at 03:21:51PM +0000, John Darrington wrote:
> It's currently implemented as a single pass algorithm, but it will > be straightforward to change that. There's no categorical variables, > or interactions, at present. > ... > I'd be interested in any comments and suggestions on how to proceed > with generalising the implementation to accept categorical variables. I have a patch below to start this. Right now, it just allocates space. It does not properly retrieve entries with categorical variables, but it is a start: diff --git a/src/language/stats/correlations.c b/src/language/stats/correlations.c index e397dae..65679f3 100644 --- a/src/language/stats/correlations.c +++ b/src/language/stats/correlations.c @@ -324,7 +324,7 @@ run_corr (struct casereader *r, const struct corr_opts *opts, const struct corr const gsl_matrix *var_matrix, *samples_matrix, *mean_matrix; const gsl_matrix *cov_matrix; gsl_matrix *corr_matrix; - struct covariance *cov = covariance_create (corr->n_vars_total, corr->vars, + struct covariance *cov = covariance_create_pass1 (corr->n_vars_total, corr->vars, opts->wv, opts->exclude); for ( ; (c = casereader_read (r) ); case_unref (c)) diff --git a/src/math/covariance.c b/src/math/covariance.c index ba0de0b..350cab4 100644 --- a/src/math/covariance.c +++ b/src/math/covariance.c @@ -31,6 +31,7 @@ struct covariance { /* The variables for which the covariance matrix is to be calculated */ size_t n_vars; + size_t dim; /* This value equals n_vars if all variables are continuous. */ const struct variable **vars; /* The weight variable (or NULL if none) */ @@ -64,11 +65,29 @@ covariance_moments (const struct covariance *cov, int m) } +static void +covariance_create_part2 (struct covariance *cov, enum mv_class exclude) +{ + size_t i; + + cov->moments = xmalloc (sizeof *cov->moments * n_MOMENTS); + + for (i = 0; i < n_MOMENTS; ++i) + cov->moments[i] = gsl_matrix_calloc (cov->dim, cov->dim); -/* Create a covariance struct */ + cov->exclude = exclude; + + cov->n_cm = (cov->dim * (cov->dim - 1) ) / 2; + + cov->cm = xcalloc (sizeof *cov->cm, cov->n_cm); +} + +/* Create a covariance struct to be computed in one data pass. + No categorical variables are allowed. +*/ struct covariance * -covariance_create (size_t n_vars, const struct variable **vars, - const struct variable *weight, enum mv_class exclude) +covariance_create_pass1 (size_t n_vars, const struct variable **vars, + const struct variable *weight, enum mv_class exclude) { size_t i; struct covariance *cov = xmalloc (sizeof *cov); @@ -76,20 +95,61 @@ covariance_create (size_t n_vars, const struct variable **vars, cov->wv = weight; cov->n_vars = n_vars; + cov->dim = n_vars; /* Only numeric variables are allowed in a single data pass, + so these values are equal. + */ for (i = 0; i < n_vars; ++i) - cov->vars[i] = vars[i]; + { + assert (var_is_numeric (vars[i])); + cov->vars[i] = vars[i]; + } - cov->moments = xmalloc (sizeof *cov->moments * n_MOMENTS); - - for (i = 0; i < n_MOMENTS; ++i) - cov->moments[i] = gsl_matrix_calloc (n_vars, n_vars); + covariance_create_part2 (cov, exclude); - cov->exclude = exclude; + return cov; +} - cov->n_cm = (n_vars * (n_vars - 1) ) / 2; +static size_t +get_dim (size_t n_vars, struct variable **vars) +{ + size_t i; + size_t dim = 0; - cov->cm = xcalloc (sizeof *cov->cm, cov->n_cm); + for (i = 0; i < n_vars; i++) + { + if (var_is_numeric (vars[i])) + { + dim++; + } + else + { + dim += cat_get_n_categories (vars[i]); + } + } + return dim; +} +/* Create a covariance struct with categorical variables. + Call this function after the first data pass. +*/ +struct covariance * +covariance_create_pass2 (size_t n_vars, const struct variable **vars, + const struct variable *weight, enum mv_class exclude) +{ + size_t i; + struct covariance *cov = xmalloc (sizeof *cov); + cov->vars = xmalloc (sizeof *cov->vars * n_vars); + + cov->wv = weight; + cov->n_vars = n_vars; + + for (i = 0; i < n_vars; ++i) + cov->vars[i] = vars[i]; + + cov->dim = get_dim (n_vars, vars); + + + covariance_create_part2 (cov, exclude); return cov; } diff --git a/src/math/covariance.h b/src/math/covariance.h index 8b8de88..7a13cd2 100644 --- a/src/math/covariance.h +++ b/src/math/covariance.h @@ -27,8 +27,14 @@ struct covariance; struct variable; struct ccase ; -struct covariance * covariance_create (size_t n_vars, const struct variable **vars, - const struct variable *wv, enum mv_class excl); +struct covariance * covariance_create_pass1 (size_t n_vars, + const struct variable **vars, + const struct variable *wv, + enum mv_class excl); +struct covariance * covariance_create_pass2 (size_t n_vars, + const struct variable **vars, + const struct variable *wv, + enum mv_class excl); void covariance_accumulate (struct covariance *, const struct ccase *); _______________________________________________ pspp-dev mailing list pspp-dev@... http://lists.gnu.org/mailman/listinfo/pspp-dev |
|
|
Re: Covariance MatrixThanks for doing this. I think I see what this is doing. I have a few comments:
1. It provokes some compiler warnings (which can be trivially fixed). 2. I know we've discussed this before, but I'm still not comfortable with the implicit assumption that catagorical variables are always non-numeric variables. Although I accept that using numeric variables as categorical ones is perhaps inadvisable, it is a legacy that we are stuck with. Both the T-TEST and ONEWAY commands permit the categorical variable to be numeric (in fact, I think ONEWAY insists upon it). So I think we have to support numerical categorical variables. 3. Perhaps related to 2. Our current practice of storing the categorical state in the variables' aux parameter is problematic, and the get_dim function reminds us of this. The loop in get_dim should not be necessary, since the total number of categories has already been calculated. I suggest that we change the interface and implementation of src/data/category.[ch] such that it stores its state in a separate object. Although it means a little more work for the caller, I think the benefits will be worthwhile. Anyway, I suggest that we create a new branch and check in Jason's patch into it and work on a categorical covariance implementation within that branch. J' On Tue, Oct 06, 2009 at 11:37:40AM -0400, Jason Stover wrote: On Sun, Oct 04, 2009 at 03:21:51PM +0000, John Darrington wrote: > It's currently implemented as a single pass algorithm, but it will > be straightforward to change that. There's no categorical variables, > or interactions, at present. > ... > I'd be interested in any comments and suggestions on how to proceed > with generalising the implementation to accept categorical variables. I have a patch below to start this. Right now, it just allocates space. It does not properly retrieve entries with categorical variables, but it is a start: diff --git a/src/language/stats/correlations.c b/src/language/stats/correlations.c index e397dae..65679f3 100644 --- a/src/language/stats/correlations.c +++ b/src/language/stats/correlations.c @@ -324,7 +324,7 @@ run_corr (struct casereader *r, const struct corr_opts *opts, const struct corr const gsl_matrix *var_matrix, *samples_matrix, *mean_matrix; const gsl_matrix *cov_matrix; gsl_matrix *corr_matrix; - struct covariance *cov = covariance_create (corr->n_vars_total, corr->vars, + struct covariance *cov = covariance_create_pass1 (corr->n_vars_total, corr->vars, opts->wv, opts->exclude); for ( ; (c = casereader_read (r) ); case_unref (c)) diff --git a/src/math/covariance.c b/src/math/covariance.c index ba0de0b..350cab4 100644 --- a/src/math/covariance.c +++ b/src/math/covariance.c @@ -31,6 +31,7 @@ struct covariance { /* The variables for which the covariance matrix is to be calculated */ size_t n_vars; + size_t dim; /* This value equals n_vars if all variables are continuous. */ const struct variable **vars; /* The weight variable (or NULL if none) */ @@ -64,11 +65,29 @@ covariance_moments (const struct covariance *cov, int m) } +static void +covariance_create_part2 (struct covariance *cov, enum mv_class exclude) +{ + size_t i; + + cov->moments = xmalloc (sizeof *cov->moments * n_MOMENTS); + + for (i = 0; i < n_MOMENTS; ++i) + cov->moments[i] = gsl_matrix_calloc (cov->dim, cov->dim); -/* Create a covariance struct */ + cov->exclude = exclude; + + cov->n_cm = (cov->dim * (cov->dim - 1) ) / 2; + + cov->cm = xcalloc (sizeof *cov->cm, cov->n_cm); +} + +/* Create a covariance struct to be computed in one data pass. + No categorical variables are allowed. +*/ struct covariance * -covariance_create (size_t n_vars, const struct variable **vars, - const struct variable *weight, enum mv_class exclude) +covariance_create_pass1 (size_t n_vars, const struct variable **vars, + const struct variable *weight, enum mv_class exclude) { size_t i; struct covariance *cov = xmalloc (sizeof *cov); @@ -76,20 +95,61 @@ covariance_create (size_t n_vars, const struct variable **vars, cov->wv = weight; cov->n_vars = n_vars; + cov->dim = n_vars; /* Only numeric variables are allowed in a single data pass, + so these values are equal. + */ for (i = 0; i < n_vars; ++i) - cov->vars[i] = vars[i]; + { + assert (var_is_numeric (vars[i])); + cov->vars[i] = vars[i]; + } - cov->moments = xmalloc (sizeof *cov->moments * n_MOMENTS); - - for (i = 0; i < n_MOMENTS; ++i) - cov->moments[i] = gsl_matrix_calloc (n_vars, n_vars); + covariance_create_part2 (cov, exclude); - cov->exclude = exclude; + return cov; +} - cov->n_cm = (n_vars * (n_vars - 1) ) / 2; +static size_t +get_dim (size_t n_vars, struct variable **vars) +{ + size_t i; + size_t dim = 0; - cov->cm = xcalloc (sizeof *cov->cm, cov->n_cm); + for (i = 0; i < n_vars; i++) + { + if (var_is_numeric (vars[i])) + { + dim++; + } + else + { + dim += cat_get_n_categories (vars[i]); + } + } + return dim; +} +/* Create a covariance struct with categorical variables. + Call this function after the first data pass. +*/ +struct covariance * +covariance_create_pass2 (size_t n_vars, const struct variable **vars, + const struct variable *weight, enum mv_class exclude) +{ + size_t i; + struct covariance *cov = xmalloc (sizeof *cov); + cov->vars = xmalloc (sizeof *cov->vars * n_vars); + + cov->wv = weight; + cov->n_vars = n_vars; + + for (i = 0; i < n_vars; ++i) + cov->vars[i] = vars[i]; + + cov->dim = get_dim (n_vars, vars); + + + covariance_create_part2 (cov, exclude); return cov; } diff --git a/src/math/covariance.h b/src/math/covariance.h index 8b8de88..7a13cd2 100644 --- a/src/math/covariance.h +++ b/src/math/covariance.h @@ -27,8 +27,14 @@ struct covariance; struct variable; struct ccase ; -struct covariance * covariance_create (size_t n_vars, const struct variable **vars, - const struct variable *wv, enum mv_class excl); +struct covariance * covariance_create_pass1 (size_t n_vars, + const struct variable **vars, + const struct variable *wv, + enum mv_class excl); +struct covariance * covariance_create_pass2 (size_t n_vars, + const struct variable **vars, + const struct variable *wv, + enum mv_class excl); void covariance_accumulate (struct covariance *, const struct ccase *); _______________________________________________ pspp-dev mailing list pspp-dev@... http://lists.gnu.org/mailman/listinfo/pspp-dev -- PGP Public key ID: 1024D/2DE827B3 fingerprint = 8797 A26D 0854 2EAB 0285 A290 8A67 719C 2DE8 27B3 See http://pgp.mit.edu or any PGP keyserver for public key. _______________________________________________ pspp-dev mailing list pspp-dev@... http://lists.gnu.org/mailman/listinfo/pspp-dev |
|
|
Re: Covariance MatrixOn Tue, Oct 06, 2009 at 07:43:06PM +0000, John Darrington wrote:
> .... The loop in get_dim should not be necessary, > since the total number of categories has already been calculated... Where? _______________________________________________ pspp-dev mailing list pspp-dev@... http://lists.gnu.org/mailman/listinfo/pspp-dev |
|
|
Re: Covariance MatrixOn Wed, Oct 07, 2009 at 08:54:40AM -0400, Jason Stover wrote:
On Tue, Oct 06, 2009 at 07:43:06PM +0000, John Darrington wrote: > .... The loop in get_dim should not be necessary, > since the total number of categories has already been calculated... Where? I should have written this more precisely: There exists the *opportunity* to pre-calculate the total number of categories (eg. in line 141 of category.c) - This would make the loop in get_dim unnecessary and the function a whole lot simpler. J' -- PGP Public key ID: 1024D/2DE827B3 fingerprint = 8797 A26D 0854 2EAB 0285 A290 8A67 719C 2DE8 27B3 See http://pgp.mit.edu or any PGP keyserver for public key. _______________________________________________ pspp-dev mailing list pspp-dev@... http://lists.gnu.org/mailman/listinfo/pspp-dev |
|
|
Re: Covariance MatrixOn Wed, Oct 07, 2009 at 05:01:37PM +0000, John Darrington wrote:
> On Wed, Oct 07, 2009 at 08:54:40AM -0400, Jason Stover wrote: > > On Tue, Oct 06, 2009 at 07:43:06PM +0000, John Darrington wrote: > > .... The loop in get_dim should not be necessary, > > since the total number of categories has already been calculated... > > Where? > > I should have written this more precisely: > > There exists the *opportunity* to pre-calculate the total number of categories (eg. > in line 141 of category.c) - This would make the loop in get_dim unnecessary and the > function a whole lot simpler. Sort-of: Line 141 of category.c updates the total number of categories for a single variable: cv->n_categories++; But that isn't the total number of categories that must be considered for the covariance matrix. For example, if our dictionary had the variables v1, v2 and v3, and v1 had n1 categories, v2 had n2 categories and v3 had n3 categories, then our covariance matrix would need to have (n1 - 1)*(n2 - 1)*(n3 - 1) rows. I guess we could compute this value in category.c, but what if someone only wanted a covariance matrix for just v1 and v2? Then the the number of rows necessary would be (n1 - 1) * (n2 - 1). Or maybe they would want v2 and v3, then we would need (n2 - 1)*(n3 - 1) rows. There is no way to know in advance how many rows we would need, unless we just compute the covariance matrix for all the variables. Which could be OK, I guess. Did you mean to always compute a covariance matrix for all the variables in the dictionary? BTW: Given what I just mentioned about the dimension necessary, that loop would have to be: for (i = 0; i < n_vars; i++) { if (var_is_numeric (vars[i])) { dim++; } else { dim += cat_get_n_categories (vars[i]) - 1; } } ...instead of just dim += cat_get_n_categories (vars[i]); _______________________________________________ pspp-dev mailing list pspp-dev@... http://lists.gnu.org/mailman/listinfo/pspp-dev |
|
|
Re: Covariance MatrixOn Wed, Oct 07, 2009 at 03:17:55PM -0400, Jason Stover wrote:
> But that isn't the total number of categories that must be considered > for the covariance matrix. For example, if our dictionary had the > variables v1, v2 and v3, and v1 had n1 categories, v2 had n2 > categories and v3 had n3 categories, then our covariance matrix would > need to have (n1 - 1)*(n2 - 1)*(n3 - 1) rows. I guess we could compute > this value in category.c, but what if someone only wanted a covariance > matrix for just v1 and v2? Then the the number of rows necessary would be > (n1 - 1) * (n2 - 1). Or maybe they would want v2 and v3, then we would > need (n2 - 1)*(n3 - 1) rows. I made a mistake here: The dimensions would be (n1-1)+(n2-1)+(n3-1), or (n1-1)+(n2-1), etc., but not (n1-1)*(n2-1). Oops. _______________________________________________ pspp-dev mailing list pspp-dev@... http://lists.gnu.org/mailman/listinfo/pspp-dev |
|
|
Re: Covariance MatrixOn Wed, Oct 07, 2009 at 07:35:10PM -0400, Jason Stover wrote:
On Wed, Oct 07, 2009 at 03:17:55PM -0400, Jason Stover wrote: > But that isn't the total number of categories that must be considered > for the covariance matrix. For example, if our dictionary had the > variables v1, v2 and v3, and v1 had n1 categories, v2 had n2 > categories and v3 had n3 categories, then our covariance matrix would > need to have (n1 - 1)*(n2 - 1)*(n3 - 1) rows. I guess we could compute > this value in category.c, but what if someone only wanted a covariance > matrix for just v1 and v2? Then the the number of rows necessary would be > (n1 - 1) * (n2 - 1). Or maybe they would want v2 and v3, then we would > need (n2 - 1)*(n3 - 1) rows. I made a mistake here: The dimensions would be (n1-1)+(n2-1)+(n3-1), or (n1-1)+(n2-1), etc., but not (n1-1)*(n2-1). Oops. My idea is to calculate the number of categories on each procedure (that needs it). So instead of the categories data having global scope, they will be created on demand (in this case inside the covariance matrix constructure). So the constructor for the covariance matrix would be passed (in addition to its current parameters) a set of categorical variables. The total number of categories can be calculated in the first pass. Thus the code would become something like: create_covariance (const struct variable **vars, int n_vars, const struct variable **catvars, int n_cat_vars) { ... cov->categories = create_categories (catvars, n_cat_vars); } void covariance_first_pass (struct covariance *cov, const struct ccase *c) { categories_update (cov->categories, c); ... } int get_dim (const struct covariance *cov) { return cov->n_vars + categories_total_number_of_categories (cov->categories); } This involves rewriting some code, but I think it will be worthwhile. We don't need the total number of categories until AFTER the first pass of the procedure. -- PGP Public key ID: 1024D/2DE827B3 fingerprint = 8797 A26D 0854 2EAB 0285 A290 8A67 719C 2DE8 27B3 See http://pgp.mit.edu or any PGP keyserver for public key. _______________________________________________ pspp-dev mailing list pspp-dev@... http://lists.gnu.org/mailman/listinfo/pspp-dev |
|
|
Re: Covariance MatrixOK, I see your point now, and it makes sense. On Thu, Oct 08, 2009 at 11:34:54AM +0000, John Darrington wrote: > On Wed, Oct 07, 2009 at 07:35:10PM -0400, Jason Stover wrote: > On Wed, Oct 07, 2009 at 03:17:55PM -0400, Jason Stover wrote: > > But that isn't the total number of categories that must be considered > > for the covariance matrix. For example, if our dictionary had the > > variables v1, v2 and v3, and v1 had n1 categories, v2 had n2 > > categories and v3 had n3 categories, then our covariance matrix would > > need to have (n1 - 1)*(n2 - 1)*(n3 - 1) rows. I guess we could compute > > this value in category.c, but what if someone only wanted a covariance > > matrix for just v1 and v2? Then the the number of rows necessary would be > > (n1 - 1) * (n2 - 1). Or maybe they would want v2 and v3, then we would > > need (n2 - 1)*(n3 - 1) rows. > > I made a mistake here: The dimensions would be (n1-1)+(n2-1)+(n3-1), > or (n1-1)+(n2-1), etc., but not (n1-1)*(n2-1). Oops. > > My idea is to calculate the number of categories on each procedure (that needs it). > So instead of the categories data having global scope, they will be created on demand > (in this case inside the covariance matrix constructure). > > So the constructor for the covariance matrix would be passed (in addition to its > current parameters) a set of categorical variables. The total number of categories > can be calculated in the first pass. Thus the code would become something like: > > create_covariance (const struct variable **vars, int n_vars, > const struct variable **catvars, int n_cat_vars) > { > ... > cov->categories = create_categories (catvars, n_cat_vars); > } > > void > covariance_first_pass (struct covariance *cov, const struct ccase *c) > { > categories_update (cov->categories, c); > ... > } > > int > get_dim (const struct covariance *cov) > { > return cov->n_vars + categories_total_number_of_categories (cov->categories); > } > > > This involves rewriting some code, but I think it will be worthwhile. We don't > need the total number of categories until AFTER the first pass of the procedure. > > -- > PGP Public key ID: 1024D/2DE827B3 > fingerprint = 8797 A26D 0854 2EAB 0285 A290 8A67 719C 2DE8 27B3 > See http://pgp.mit.edu or any PGP keyserver for public key. > > _______________________________________________ pspp-dev mailing list pspp-dev@... http://lists.gnu.org/mailman/listinfo/pspp-dev |
| Free embeddable forum powered by Nabble | Forum Help |