Delete rows from R data frame

Deleting rows from a data frame in R is easy by combining simple operations. Let’s say you are working with the built-in data set airquality and need to remove rows where the ozone is NA (also called null, blank or missing). The method is a conceptually different than a SQL database that has a dedicated delete command: in R deleting rows can be done simply by replacing the data frame with another data frame without those rows.

Before we make any changes, let’s count the number of NA records:

summary(airquality$Ozone)

The next step is identifying the rows. This code prints the rows where the Ozone is NA using a list comprehension:

airquality[is.na(airquality$Ozone),]

If you are a beginner, it’s worth analyzing this step in detail. Try running the inner part by itself:

is.na(airquality$Ozone)

This yields a long vector of TRUE and FALSE. When put plugged in to the data frame (the first code fragment), it tells R which rows to return. Since we want to remove the NA, we just need to reverse it using a boolean-not operator:

airquality[!is.na(airquality$Ozone),]

You just printed the desired data frame (where Ozone is not NA) to the screen. The last step (the only step you really need) is to “delete” the rows by recreating the data frame: just reassign the data frame from the filtered rows.

airquality <- airquality[!is.na(airquality$Ozone),]

To verify it worked, run:

summary(airquality$Ozone)

Now there are no NA records for Ozone, but there are 5 for Solar. To filter two columns (variables) at a time, combine them with boolean logic:

airquality<-airquality[!is.na(airquality$Ozone) & !is.na(airquality$Solar.R),]

This entry was posted in Basics and tagged , , , . Bookmark the permalink.

17 Responses to Delete rows from R data frame

  1. erikr says:

    Good tip. If you want to subset a data.frame to only rows that have no missing information, you can use the “complete.cases” function, like the following…

    >tmp tmp
    a b
    1 1 NA
    2 2 2
    3 3 3
    4 4 4
    5 5 5
    6 6 6
    7 7 7
    8 8 8
    9 9 9
    10 10 10
    11 NA 11
    > tmp[complete.cases(tmp),]
    a b
    2 2 2
    3 3 3
    4 4 4
    5 5 5
    6 6 6
    7 7 7
    8 8 8
    9 9 9
    10 10 10

  2. erikr says:

    Well, that was a little bit mangled, sorry. The important bit is that for a data.frame called “tmp”, try

    tmp[complete.cases(tmp),]

  3. heuristicandrew says:

    Nice, erick. I didn’t notice complete.cases() before.

  4. Bob Muenchen says:

    An easy way to get rid of rows that contain any NAs is with na.omit:

    > summary(airquality$Ozone)
    Min. 1st Qu. Median Mean 3rd Qu. Max. NA’s
    1.00 18.00 31.50 42.13 63.25 168.00 37.00

    > temp summary(temp$Ozone)
    Min. 1st Qu. Median Mean 3rd Qu. Max.
    1.0 18.0 31.0 42.1 62.0 168.0

    Cheers,
    Bob

  5. Bob Muenchen says:

    Wow, what happened to that paste?? I’ll try again:

    > summary(airquality$Ozone)
    Min. 1st Qu. Median Mean 3rd Qu. Max. NA’s
    1.00 18.00 31.50 42.13 63.25 168.00 37.00

    > temp summary(temp$Ozone)
    Min. 1st Qu. Median Mean 3rd Qu. Max.
    1.0 18.0 31.0 42.1 62.0 168.0

  6. Bob Muenchen says:

    Grrr! What is not pasting I’ll just type in:

    This makes a copy with all rows containing NAs removed:

    temp <- na.omit(airquality)

    And this shows they're gone:

    summary(temp$Ozone)

  7. callmeRK says:

    What if instead of NA’s we wanted to omit something else, say a string like “Wrong” or a 0?

    Thanks

  8. Calo says:

    Hi, pls forgive, also a “newbie” here.

    What if, in a data.frame, there is a variable called “acsyr.” This field/variable has three years, “2007,” “2008,” and “2009,” and each of those years counts the number of observations occurring during that year. However, by some unknown reason, it has an additional year “2.” Yes, “2,” with 7 observations.

    I want to delete just those 7 records. I’ve tried several examples from here and other places, but can’t get it to work.

    Subsetting and creating a new data.frame is not efficient in this case because the data.frame is huge. It has about 9 million records and 24 fields/variables. Even though I have a pretty good laptop (8 megs of RAM, i7, 1 terrabyte hard-drive, Windows 7, etc), I’ve experienced problems with memory and R.

    I take that back, subsetting could work with this data.frame but I have another data.frame which is almost twice the size and for which I will need to do the same procedure. The data I’m using is actually the 3-year ACS from the Census. If anyone is familiar with that data, I stripped the first four characters from the “SERIALNO” from each observation, to capture the year (and later do a Cox proportional hazards model).

    Any help would be greatly appreciated. THANKS!

    Here are the results of table(s3yr$acsyr):

    > table(s3yr$acsyr)

    2 2007 2008 2009
    7 2994658 3000655 3030727

  9. Calo says:

    The pasting of the table results didn’t show very well, let’s see if this shows it better:
    2 2007 2008 2009
    7 2994658 3000655 3030727

  10. Calo says:

    Ahh, it comes out the same… each figure in the bottom row corresponds to each year from the first row. Anyhow, any help would be greatly, greatly appreciated.

  11. Rodrigo says:

    Hi! thanks, it was very helpfull. I have a simple question :
    How can I “translate” NA to 0 or -1 ?
    I use “for” but i believe that there is a (lot) better solution in R@
    Thanks in advance.

    • heuristicandrew says:

      The is.na() function translates NA to TRUE or FALSE which have values 1 and 0. For example,

      x <- c(1,2,3,NA,5,6,NA,7)
      is.na(x)
      
      • Rodrigo says:

        I see… but…
        i want the result:
        1,2,3,0,5,6,0,7
        A simple is.na(x) gives me [1] FALSE FALSE FALSE TRUE FALSE FALSE TRUE FALSE

      • heuristicandrew says:

        Rodrigo: Use the ifelse() function like this, vectorized without a for loop:

        x2 <- ifelse(is.na(x), 0, x)
        
  12. Rodrigo says:

    Thank you!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s