removing NA from a data frame

View: New views
12 Messages — Rating Filter:   Alert me  

removing NA from a data frame

by Sam Steingold-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi,
It appears that deal does not support missing values (NA), so I need to
remove them (NAs) from my data frame.
how do I do this?
(I am very new to R, so a detailed step-by-step
explanation with code samples would be nice).

Some columns (variables) have quite a few NAs, so I would rather drop
the whole column than sacrifice all the rows (observations) which have
NA in that column.
How do I remove a column from a data frame?

Thanks!

--
Sam Steingold (http://www.podval.org/~sds) on Fedora Core release 4 (Stentz)
http://ffii.org http://www.mideasttruth.com http://pmw.org.il
http://www.dhimmi.com http://www.honestreporting.com http://www.jihadwatch.org
Don't hit a man when he's down -- kick him; it's easier.

______________________________________________
R-help@... mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: removing NA from a data frame

by Francisco Zagmutt :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Sam

If you are new to R it will definitively pay off to start from the basics.  
Go to the help menu-> manuals in pdf and select "An Introduction to R".  
After you read that document you will be able to answer your questions :-)

Good luck!

Francisco


>From: Sam Steingold <sds@...>
>Reply-To: sds@...
>To: r-help@...
>Subject: [R] removing NA from a data frame
>Date: Fri, 17 Mar 2006 15:17:51 -0500
>
>Hi,
>It appears that deal does not support missing values (NA), so I need to
>remove them (NAs) from my data frame.
>how do I do this?
>(I am very new to R, so a detailed step-by-step
>explanation with code samples would be nice).
>
>Some columns (variables) have quite a few NAs, so I would rather drop
>the whole column than sacrifice all the rows (observations) which have
>NA in that column.
>How do I remove a column from a data frame?
>
>Thanks!
>
>--
>Sam Steingold (http://www.podval.org/~sds) on Fedora Core release 4
>(Stentz)
>http://ffii.org http://www.mideasttruth.com http://pmw.org.il
>http://www.dhimmi.com http://www.honestreporting.com 
>http://www.jihadwatch.org
>Don't hit a man when he's down -- kick him; it's easier.
>
>______________________________________________
>R-help@... mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide!
>http://www.R-project.org/posting-guide.html

______________________________________________
R-help@... mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: removing NA from a data frame

by Sam Steingold-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> * Francisco J. Zagmutt <trevsnygr28@...> [2006-03-17 21:09:48 +0000]:
>
> Go to the help menu-> manuals in pdf and select "An Introduction to
> R".  After you read that document you will be able to answer your
> questions :-)

I did.  I still need help.

The matter is not so much with "getting things done" (I can probably
write the code - although I would rather not) as with not reinventing
the wheel.

PS. next time you decide to answer my question with "RTFM", please also
    include the number of the page that answers my specific question.

--
Sam Steingold (http://www.podval.org/~sds) on Fedora Core release 4 (Stentz)
http://www.jihadwatch.org http://www.camera.org http://www.mideasttruth.com
http://www.memri.org http://www.palestinefacts.org http://www.savegushkatif.org
Type louder, please.

______________________________________________
R-help@... mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: removing NA from a data frame

by Ben Bolker :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Sam Steingold <sds <at> podval.org> writes:

>
> Hi,
> It appears that deal does not support missing values (NA), so I need to
> remove them (NAs) from my data frame.
> how do I do this?
> (I am very new to R, so a detailed step-by-step
> explanation with code samples would be nice).

  If you wanted to remove rows with NAs from data frame X
na.omit(X) would do it.

  In this case I think

X[!sapply(X,function(z)any(is.na(z)))]

 should work, although I haven't tested it.
function(z)any(is.na(z)) looks for any NA values
sapply applies the function to each element
in the list (= column in the data frame) and
returns a vector
! negates the logical vector
[] picks the appropriate elements (=columns) out
of the list (=dataframe)

  I haven't tested it.
  Conceivably

X[!sapply(is.na(X),any)]

or

X[sapply(!is.na(X),all)]

 would work too, although I'm not sure.

  Ben

______________________________________________
R-help@... mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: removing NA from a data frame

by Haifeng Xie :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

If I understand it correctly, something like this should do what you want

x[!apply(x, 1, function(y) any(is.na(y)), ]

where x is the dataframe in question.

Hope that helps.

Kevin


----- Original Message -----
From: "Ben Bolker" <bolker@...>
To: <r-help@...>
Sent: Friday, March 17, 2006 10:33 PM
Subject: Re: [R] removing NA from a data frame


> Sam Steingold <sds <at> podval.org> writes:
>
>>
>> Hi,
>> It appears that deal does not support missing values (NA), so I need to
>> remove them (NAs) from my data frame.
>> how do I do this?
>> (I am very new to R, so a detailed step-by-step
>> explanation with code samples would be nice).
>
>  If you wanted to remove rows with NAs from data frame X
> na.omit(X) would do it.
>
>  In this case I think
>
> X[!sapply(X,function(z)any(is.na(z)))]
>
> should work, although I haven't tested it.
> function(z)any(is.na(z)) looks for any NA values
> sapply applies the function to each element
> in the list (= column in the data frame) and
> returns a vector
> ! negates the logical vector
> [] picks the appropriate elements (=columns) out
> of the list (=dataframe)
>
>  I haven't tested it.
>  Conceivably
>
> X[!sapply(is.na(X),any)]
>
> or
>
> X[sapply(!is.na(X),all)]
>
> would work too, although I'm not sure.
>
>  Ben
>
> ______________________________________________
> R-help@... mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
>

______________________________________________
R-help@... mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: removing NA from a data frame

by Ben Bolker :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Haifeng Xie <xieh <at> wmin.ac.uk> writes:

>
> If I understand it correctly, something like this should do what you want
>
> x[!apply(x, 1, function(y) any(is.na(y)), ]
>
> where x is the dataframe in question.
>
> Hope that helps.
>
> Kevin
>

   I believe he wants to remove *columns* with NAs, not rows
(if he wanted to remove rows then complete.cases(x) would work)

x[,!apply(x,2,function(y)any(is.na(y))]

or

x[,!apply(is.na(x),2,any)]

 (I wasn't sure one could apply() on columns of a data frame --
I'm always a little certain about the matrix <-> data.frame
mapping -- but I tried it and you can.  Now that I think
about it, I don't know why I thought you couldn't.  apply()
on rows would be more likely to be problematic.)

  is.na() turns a data frame into a matrix, so

x[!sapply(is.na(x),any)]  

*does not* work.

x[complete.cases(t(is.na(x)))]

or t(na.omit(t(X)))

both do, if your data frame is all numeric.
 
   Ben

______________________________________________
R-help@... mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: removing NA from a data frame

by Adaikalavan Ramasamy :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

You might find the 2nd part of the following response useful
https://stat.ethz.ch/pipermail/r-help/2006-March/090611.html

And if you want to RTFM, I guess sections 2.5, 2.7, 5.1, 5.2 of
http://cran.r-project.org/doc/manuals/R-intro.html might be useful.


PS:

1) R-help is designed for and by unpaid volunteers. Therefore sometimes
RTFM without page reference is quite acceptable.

2) Similar question often gets repeated over and over the list. It might
be useful to search http://finzi.psych.upenn.edu/nmz.html first.



On Fri, 2006-03-17 at 16:17 -0500, Sam Steingold wrote:

> > * Francisco J. Zagmutt <trevsnygr28@...> [2006-03-17 21:09:48 +0000]:
> >
> > Go to the help menu-> manuals in pdf and select "An Introduction to
> > R".  After you read that document you will be able to answer your
> > questions :-)
>
> I did.  I still need help.
>
> The matter is not so much with "getting things done" (I can probably
> write the code - although I would rather not) as with not reinventing
> the wheel.
>
> PS. next time you decide to answer my question with "RTFM", please also
>     include the number of the page that answers my specific question.
>

______________________________________________
R-help@... mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: removing NA from a data frame

by Sam Steingold-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> * Adaikalavan Ramasamy <enznfnzl@...> [2006-03-19 04:51:19 +0000]:
>
> 1) R-help is designed for and by unpaid volunteers. Therefore
> sometimes RTFM without page reference is quite acceptable.

I am an "unpaid volunteer" maintainer of CLISP (http://clisp.cons.org).
I often answer questions with specific link to the CLISP FAQ (which
really is just that - the list of the frequently asked questions).

If you do not feel like answering a question, it is perfectly fine with
me, I do not think that anyone owes me anything.
All I am asking is that if you do decide to answer, please make your
answer immediately useful, i.e., not requiring learning all the manual by
heart (a specific manual section is perfectly fine though).
Thanks.

PS. Many thanks to Ben Bolker and Haifeng Xie for their help!

PPS. how do I figure out the number of rows in a data.frame?
     is length(attr(X,"row.names")) the right way?

--
Sam Steingold (http://www.podval.org/~sds) on Fedora Core release 4 (Stentz)
http://www.camera.org http://pmw.org.il http://www.memri.org
http://www.jihadwatch.org http://www.dhimmi.com
Are you smart enough to use Lisp?

______________________________________________
R-help@... mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: removing NA from a data frame

by Bert Gunter :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> If you do not feel like answering a question, it is perfectly
> fine with
> me, I do not think that anyone owes me anything.
> All I am asking is that if you do decide to answer, please make your
> answer immediately useful, i.e., not requiring learning all
> the manual by
> heart (a specific manual section is perfectly fine though).
> Thanks.
>
 
But do you not owe the list the courtesy of first making a reasonable
attempt on your own?


> PPS. how do I figure out the number of rows in a data.frame?
>      is length(attr(X,"row.names")) the right way?

help.search("number of rows") immediately gets you your answer!

-- Bert Gunter
Genentech

______________________________________________
R-help@... mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: removing NA from a data frame

by Sam Steingold-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> * Berton Gunter <thagre.oregba@...> [2006-03-20 09:42:49 -0800]:
>
>> If you do not feel like answering a question, it is perfectly
>> fine with
>> me, I do not think that anyone owes me anything.
>> All I am asking is that if you do decide to answer, please make your
>> answer immediately useful, i.e., not requiring learning all
>> the manual by
>> heart (a specific manual section is perfectly fine though).
>> Thanks.
>>
>  
> But do you not owe the list the courtesy of first making a reasonable
> attempt on your own?

I do.

Nevertheless, please remember that "a reasonable attempt" means
different things for a newbie and for an expert.

>
>> PPS. how do I figure out the number of rows in a data.frame?
>>      is length(attr(X,"row.names")) the right way?
>
> help.search("number of rows") immediately gets you your answer!

thanks!

--
Sam Steingold (http://www.podval.org/~sds) on Fedora Core release 4 (Stentz)
http://www.mideasttruth.com http://ffii.org http://www.memri.org
http://www.iris.org.il http://pmw.org.il http://www.palestinefacts.org
All extremists should be taken out and shot.

______________________________________________
R-help@... mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Replies on this list [was: removing NA from a data frame]

by François Pinard :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

[Berton Gunter]
>[Sam Steingold]

>> PPS. how do I figure out the number of rows in a data.frame?
>>      is length(attr(X,"row.names")) the right way?

>help.search("number of rows") immediately gets you your answer!

Hi, people.  Here, I get:

  Help files with alias or concept or title matching ‘number of rows’
  using fuzzy matching:

  nrow(base)              The Number of Rows/Columns of an Array

and '?nrow' says that it meant for arrays: nothing about data.frame, and
not a generic method either.  Even if it was a class method, we should
not expect a new user to be very familiar with R (both!) class systems
from the start.

What a new user might think, reading the documentation?   Sam Steingold
is surely an experimented and competent computer guy.  He might guess,
who knows, that some automatic array to data.frame conversion occurs
(all inefficient that it could be).  Yet this would not match other
knowledge nor experimentation, as a data.frame is hardly an array:

  > x = data.frame(a=1:3, b=c(TRUE, TRUE, FALSE), c=letters[1:3])
  > as.array(x)
  Erreur dans "dimnames<-.data.frame"(`*tmp*`, value = list(c("a", "b", "c" :
          'dimnames' incorrect pour ce tableau de données

Despite help.search("number of rows") provides an answer that happens to
be right, it might not be recognised as such by an intelligent reader,
and so, it is not really satisfactory.  The documentation for "nrow"
could be improved by saying that it applies to any kind of structure for
which dim() is meaningful.  And even then, ?dim is silent about data
frames.  One clue (yet a pretty weak one) that nrow may be applied to
a data.frame comes from the fact that ?dim.data.frame lists the same
documentation as ?dim.


Why do I say all this?  Because it happens, not necessarily in this
case, a bit too often nevertheless, that answers given to users are
uselessly harsh or haughty.  Especially when they imply that the
documentation is perfect.  One problem is that some people enjoy reading
such replies.  As example of this strange kind of pleasure, here is
a excerpt from R Archives, which I find especially enlightening on the
mentality of few members:

  From: swis@... (Steve Wisdom)
  Date: 2003-12-26 17:04
  Subject: [R] re| Dr Ward on List protocol

  "Andrew C. Ward" <acward@...> :

  >With respect to 'tone' and 'friendliness', perhaps all that is meant or
  >needed is that people be polite and respectful.

  >I shake my head as often at rude answers

  Oh, by gosh, by golly.

  I don't think an occasional dose of 'real life', via a jab from the
  Professor, will cause any lasting harm to the cosseted & emolumated students
  and academics on the List.

  On a Wall St trading desk, for example, every day one is kicked in the head
  more brutally by clients, superiors, counterparts, the markets & etc, than
  ever one would be by the Professor.

  Plus, the Professor's jabs are good Schadenfreudic fun for the rest of us.

  Regards,

  Steve Wisdom
  Westport CT US

The truth is that not everybody around here is "cosseted & emolumated
students and academics".  Moreover, behaviour at trading desks is fully
irrelevant, and for most of us, this is not the kind of life we chose to
live.  Wrong behaviour elsewhere is hardly an excuse for not behaving
properly, here.

Moreover, what is mere "good fun" for some may be perceived as highly
inelegant by others.  While some competent members may inspire
admiration and charism by their knowledge and dedication, they sometimes
damage beyond repair what they inspire, when showing poor humanity.

I'm aware of the constant fear some have of seeing this list abused.  
There are ways for not being abused, which do not require becoming
abusive ourselves.  We should deepen such ways in our own habits.

--
François Pinard   http://pinard.progiciels-bpi.ca

______________________________________________
R-help@... mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: Replies on this list [was: removing NA from a data frame]

by Uwe Ligges :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

François Pinard wrote:

> [Berton Gunter]
>
>>[Sam Steingold]
>
>
>>>PPS. how do I figure out the number of rows in a data.frame?
>>>     is length(attr(X,"row.names")) the right way?
>
>
>>help.search("number of rows") immediately gets you your answer!
>
>
> Hi, people.  Here, I get:
>
>   Help files with alias or concept or title matching ‘number of rows’
>   using fuzzy matching:
>
>   nrow(base)              The Number of Rows/Columns of an Array
>
> and '?nrow' says that it meant for arrays: nothing about data.frame, and

Very well about data.frames! ?nrow says in its argument description:

"x a vector, array or data frame "

Uwe Ligges



> not a generic method either.  Even if it was a class method, we should
> not expect a new user to be very familiar with R (both!) class systems
> from the start.
>
> What a new user might think, reading the documentation?   Sam Steingold
> is surely an experimented and competent computer guy.  He might guess,
> who knows, that some automatic array to data.frame conversion occurs
> (all inefficient that it could be).  Yet this would not match other
> knowledge nor experimentation, as a data.frame is hardly an array:
>
>   > x = data.frame(a=1:3, b=c(TRUE, TRUE, FALSE), c=letters[1:3])
>   > as.array(x)
>   Erreur dans "dimnames<-.data.frame"(`*tmp*`, value = list(c("a", "b", "c" :
>           'dimnames' incorrect pour ce tableau de données
>
> Despite help.search("number of rows") provides an answer that happens to
> be right, it might not be recognised as such by an intelligent reader,
> and so, it is not really satisfactory.  The documentation for "nrow"
> could be improved by saying that it applies to any kind of structure for
> which dim() is meaningful.  And even then, ?dim is silent about data
> frames.  One clue (yet a pretty weak one) that nrow may be applied to
> a data.frame comes from the fact that ?dim.data.frame lists the same
> documentation as ?dim.
>
>
> Why do I say all this?  Because it happens, not necessarily in this
> case, a bit too often nevertheless, that answers given to users are
> uselessly harsh or haughty.  Especially when they imply that the
> documentation is perfect.  One problem is that some people enjoy reading
> such replies.  As example of this strange kind of pleasure, here is
> a excerpt from R Archives, which I find especially enlightening on the
> mentality of few members:
>
>   From: swis@... (Steve Wisdom)
>   Date: 2003-12-26 17:04
>   Subject: [R] re| Dr Ward on List protocol
>
>   "Andrew C. Ward" <acward@...> :
>
>   >With respect to 'tone' and 'friendliness', perhaps all that is meant or
>   >needed is that people be polite and respectful.
>
>   >I shake my head as often at rude answers
>
>   Oh, by gosh, by golly.
>
>   I don't think an occasional dose of 'real life', via a jab from the
>   Professor, will cause any lasting harm to the cosseted & emolumated students
>   and academics on the List.
>
>   On a Wall St trading desk, for example, every day one is kicked in the head
>   more brutally by clients, superiors, counterparts, the markets & etc, than
>   ever one would be by the Professor.
>
>   Plus, the Professor's jabs are good Schadenfreudic fun for the rest of us.
>
>   Regards,
>
>   Steve Wisdom
>   Westport CT US
>
> The truth is that not everybody around here is "cosseted & emolumated
> students and academics".  Moreover, behaviour at trading desks is fully
> irrelevant, and for most of us, this is not the kind of life we chose to
> live.  Wrong behaviour elsewhere is hardly an excuse for not behaving
> properly, here.
>
> Moreover, what is mere "good fun" for some may be perceived as highly
> inelegant by others.  While some competent members may inspire
> admiration and charism by their knowledge and dedication, they sometimes
> damage beyond repair what they inspire, when showing poor humanity.
>
> I'm aware of the constant fear some have of seeing this list abused.  
> There are ways for not being abused, which do not require becoming
> abusive ourselves.  We should deepen such ways in our own habits.
>

______________________________________________
R-help@... mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html