Strange t-test error: "grouping factor must have exactly 2 levels" while it does...

View: New views
4 Messages — Rating Filter:   Alert me  

Strange t-test error: "grouping factor must have exactly 2 levels" while it does...

by Tymek W :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi,

Could anyone tell me what is wrong:

> length(unique(mydata$myvariable))
[1] 2
>

and in t-test:

(...)
Error in t.test.formula(othervariable ~ myvariable, mydata) :
  grouping factor must have exactly 2 levels
>

I re-checked the code and still don't get what is wrong.

Moreover, there is some strange behavior:

/1 It seems that the error is vulnerable to NA'a, because it affects
some variables in data set with NA's and doesn't affect same ones in
dataset with NA's removed.

/2 It seems it works differently with different ways of using
variables in t.test:

eg. it hapends here: t.test(x~y, dataset) and does not here:
t.test(dataset[['x']]~dataset[['y']])

Does anyone have any ideas?

Greetz,
Timo

______________________________________________
R-help@... mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: Strange t-test error: "grouping factor must have exactly 2 levels" while it does...

by Marc Schwartz-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Jul 9, 2009, at 5:04 PM, Tymek W wrote:

> Hi,
>
> Could anyone tell me what is wrong:
>
>> length(unique(mydata$myvariable))
> [1] 2
>>
>
> and in t-test:
>
> (...)
> Error in t.test.formula(othervariable ~ myvariable, mydata) :
>  grouping factor must have exactly 2 levels
>>
>
> I re-checked the code and still don't get what is wrong.
>
> Moreover, there is some strange behavior:
>
> /1 It seems that the error is vulnerable to NA'a, because it affects
> some variables in data set with NA's and doesn't affect same ones in
> dataset with NA's removed.
>
> /2 It seems it works differently with different ways of using
> variables in t.test:
>
> eg. it hapends here: t.test(x~y, dataset) and does not here:
> t.test(dataset[['x']]~dataset[['y']])
>
> Does anyone have any ideas?
>
> Greetz,
> Timo


Check the output of:

   na.omit(cbind(mydata$othervariable, mydata$myvariable))

which will give you some insight into what data is actually available  
to be used in the t test. This will remove any rows that have missing  
data. Your first test above, checking the number of levels, is before  
missing data is removed.

The likelihood is that once missing values have been removed, you are  
only left with one unique grouping value in mydata$myvariable.

For your note number 2, it should be the same for both examples, as in  
both cases, the same basic approach is used. For example:

DF <- data.frame(x = c(1:3, NA, NA, NA), y = rep(1:2, each = 3))

 > DF
    x y
1  1 1
2  2 1
3  3 1
4 NA 2
5 NA 2
6 NA 2

# Remove missing data
 > na.omit(DF)
   x y
1 1 1
2 2 1
3 3 1

 > t.test(x ~ y, data = DF)
Error in t.test.formula(x ~ y, data = DF) :
   grouping factor must have exactly 2 levels

 > t.test(DF$x ~ DF$y)
Error in t.test.formula(DF$x ~ DF$y) :
   grouping factor must have exactly 2 levels


If you have a small reproducible example where the two function calls  
behave differently, please post back with it.

HTH,

Marc Schwartz

______________________________________________
R-help@... mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: Strange t-test error: "grouping factor must have exactly 2 levels" while it does...

by Tymek W :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Thanks for your hints, but I'm still stuck... In dataset I mentioned
(N=134) there are only 3 NA's in variable, and 41% : 59% distribution
of the two values. It doesn't look like it was because of the data...

I changed and simplified my function, now it prints levels before
doing the rest. Here's a "funny" error result:

> myfun(data, 'varname')

 Levels = 2

Error in t.test.formula(data[[nam[v]]] ~ data[[g]]) :
  grouping factor must have exactly 2 levels

...

I'll paste simplified code, maybe it'd give someone a clue what is going wrong:

myfun <- function(data, g) {
       
        require(stats)

        data <- as.data.frame(data)
        nam <- names(data)
        res <- matrix(NA,ncol(data))
       
        cat("\n Levels =", nlevels(factor(data[[g]])),"\n\n")
               
        for (v in 1:ncol(data)) {
                if (nam[v] != g) {
                        res[v] <- list(t.test(data[[nam[v]]]~data[[g]]))
        }}
        res
}

What is going wrong here?

Greetz,
Timo


2009/7/10 Marc Schwartz <marc_schwartz@...>:

> On Jul 9, 2009, at 5:04 PM, Tymek W wrote:
>
>> Hi,
>>
>> Could anyone tell me what is wrong:
>>
>>> length(unique(mydata$myvariable))
>>
>> [1] 2
>>>
>>
>> and in t-test:
>>
>> (...)
>> Error in t.test.formula(othervariable ~ myvariable, mydata) :
>>  grouping factor must have exactly 2 levels
>>>
>>
>> I re-checked the code and still don't get what is wrong.
>>
>> Moreover, there is some strange behavior:
>>
>> /1 It seems that the error is vulnerable to NA'a, because it affects
>> some variables in data set with NA's and doesn't affect same ones in
>> dataset with NA's removed.
>>
>> /2 It seems it works differently with different ways of using
>> variables in t.test:
>>
>> eg. it hapends here: t.test(x~y, dataset) and does not here:
>> t.test(dataset[['x']]~dataset[['y']])
>>
>> Does anyone have any ideas?
>>
>> Greetz,
>> Timo
>
>
> Check the output of:
>
>  na.omit(cbind(mydata$othervariable, mydata$myvariable))
>
> which will give you some insight into what data is actually available to be
> used in the t test. This will remove any rows that have missing data. Your
> first test above, checking the number of levels, is before missing data is
> removed.
>
> The likelihood is that once missing values have been removed, you are only
> left with one unique grouping value in mydata$myvariable.
>
> For your note number 2, it should be the same for both examples, as in both
> cases, the same basic approach is used. For example:
>
> DF <- data.frame(x = c(1:3, NA, NA, NA), y = rep(1:2, each = 3))
>
>> DF
>   x y
> 1  1 1
> 2  2 1
> 3  3 1
> 4 NA 2
> 5 NA 2
> 6 NA 2
>
> # Remove missing data
>> na.omit(DF)
>  x y
> 1 1 1
> 2 2 1
> 3 3 1
>
>> t.test(x ~ y, data = DF)
> Error in t.test.formula(x ~ y, data = DF) :
>  grouping factor must have exactly 2 levels
>
>> t.test(DF$x ~ DF$y)
> Error in t.test.formula(DF$x ~ DF$y) :
>  grouping factor must have exactly 2 levels
>
>
> If you have a small reproducible example where the two function calls behave
> differently, please post back with it.
>
> HTH,
>
> Marc Schwartz
>
>



--
pozdrawiam,
Tymek W

______________________________________________
R-help@... mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: Strange t-test error: "grouping factor must have exactly 2 levels" while it does...

by Petr Pikal :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi

you have to look to your data
when I used your function to some artificial data I got expected result

> myfun(visko,"konc")

 Levels = 2

[[1]]
[1] NA

[[2]]

        Welch Two Sample t-test

data:  data[[nam[v]]] by data[[g]]
t = -1.7778, df = 4.541, p-value = 0.1415
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -12.861362   2.535362
sample estimates:
mean in group 1 mean in group 2
          6.685          11.848


[[3]]

        Welch Two Sample t-test

data:  data[[nam[v]]] by data[[g]]
t = -2.6074, df = 3.263, p-value = 0.07327
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -10.070027   0.775027
sample estimates:
mean in group 1 mean in group 2
         2.3275          6.9750

try

debug(myfun)

and see at what column it gives an error and how all values look like
immediately before an error.

Regards
Petr


r-help-bounces@... napsal dne 10.07.2009 11:40:30:

> Thanks for your hints, but I'm still stuck... In dataset I mentioned
> (N=134) there are only 3 NA's in variable, and 41% : 59% distribution
> of the two values. It doesn't look like it was because of the data...
>
> I changed and simplified my function, now it prints levels before
> doing the rest. Here's a "funny" error result:
>
> > myfun(data, 'varname')
>
>  Levels = 2
>
> Error in t.test.formula(data[[nam[v]]] ~ data[[g]]) :
>   grouping factor must have exactly 2 levels
>
> ...
>
> I'll paste simplified code, maybe it'd give someone a clue what is going
wrong:

>
> myfun <- function(data, g) {
>
>    require(stats)
>
>    data <- as.data.frame(data)
>    nam <- names(data)
>    res <- matrix(NA,ncol(data))
>
>    cat("\n Levels =", nlevels(factor(data[[g]])),"\n\n")
>
>    for (v in 1:ncol(data)) {
>       if (nam[v] != g) {
>          res[v] <- list(t.test(data[[nam[v]]]~data[[g]]))
>    }}
>    res
> }
>
> What is going wrong here?
>
> Greetz,
> Timo
>
>
> 2009/7/10 Marc Schwartz <marc_schwartz@...>:
> > On Jul 9, 2009, at 5:04 PM, Tymek W wrote:
> >
> >> Hi,
> >>
> >> Could anyone tell me what is wrong:
> >>
> >>> length(unique(mydata$myvariable))
> >>
> >> [1] 2
> >>>
> >>
> >> and in t-test:
> >>
> >> (...)
> >> Error in t.test.formula(othervariable ~ myvariable, mydata) :
> >>  grouping factor must have exactly 2 levels
> >>>
> >>
> >> I re-checked the code and still don't get what is wrong.
> >>
> >> Moreover, there is some strange behavior:
> >>
> >> /1 It seems that the error is vulnerable to NA'a, because it affects
> >> some variables in data set with NA's and doesn't affect same ones in
> >> dataset with NA's removed.
> >>
> >> /2 It seems it works differently with different ways of using
> >> variables in t.test:
> >>
> >> eg. it hapends here: t.test(x~y, dataset) and does not here:
> >> t.test(dataset[['x']]~dataset[['y']])
> >>
> >> Does anyone have any ideas?
> >>
> >> Greetz,
> >> Timo
> >
> >
> > Check the output of:
> >
> >  na.omit(cbind(mydata$othervariable, mydata$myvariable))
> >
> > which will give you some insight into what data is actually available
to be
> > used in the t test. This will remove any rows that have missing data.
Your
> > first test above, checking the number of levels, is before missing
data is
> > removed.
> >
> > The likelihood is that once missing values have been removed, you are
only
> > left with one unique grouping value in mydata$myvariable.
> >
> > For your note number 2, it should be the same for both examples, as in
both

> > cases, the same basic approach is used. For example:
> >
> > DF <- data.frame(x = c(1:3, NA, NA, NA), y = rep(1:2, each = 3))
> >
> >> DF
> >   x y
> > 1  1 1
> > 2  2 1
> > 3  3 1
> > 4 NA 2
> > 5 NA 2
> > 6 NA 2
> >
> > # Remove missing data
> >> na.omit(DF)
> >  x y
> > 1 1 1
> > 2 2 1
> > 3 3 1
> >
> >> t.test(x ~ y, data = DF)
> > Error in t.test.formula(x ~ y, data = DF) :
> >  grouping factor must have exactly 2 levels
> >
> >> t.test(DF$x ~ DF$y)
> > Error in t.test.formula(DF$x ~ DF$y) :
> >  grouping factor must have exactly 2 levels
> >
> >
> > If you have a small reproducible example where the two function calls
behave

> > differently, please post back with it.
> >
> > HTH,
> >
> > Marc Schwartz
> >
> >
>
>
>
> --
> pozdrawiam,
> Tymek W
>
> ______________________________________________
> R-help@... mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@... mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.