Deleting rows from a data frame in R is easy by combining simple operations. Let’s say you are working with the built-in data set airquality and need to remove rows where the ozone is NA (also called null, blank or missing). The method is a conceptually different than a SQL database that has a dedicated delete command: in R deleting rows can be done simply by replacing the data frame with another data frame without those rows.
Before we make any changes, let’s count the number of NA records:
The next step is identifying the rows. This code prints the rows where the Ozone is NA using a list comprehension:
If you are a beginner, it’s worth analyzing this step in detail. Try running the inner part by itself:
This yields a long vector of TRUE and FALSE. When put plugged in to the data frame (the first code fragment), it tells R which rows to return. Since we want to remove the NA, we just need to reverse it using a boolean-not operator:
You just printed the desired data frame (where Ozone is not NA) to the screen. The last step (the only step you really need) is to “delete” the rows by recreating the data frame: just reassign the data frame from the filtered rows.
airquality <- airquality[!is.na(airquality$Ozone),]
To verify it worked, run:
Now there are no NA records for Ozone, but there are 5 for Solar. To filter two columns (variables) at a time, combine them with boolean logic:
airquality<-airquality[!is.na(airquality$Ozone) & !is.na(airquality$Solar.R),]