5 Ways to Filter Data in R
A fundamental piece in data cleaning
Python and R are the two key players in the data science ecosystem. While R is not as popular as Python, it is just as efficient and capable as R doing data manipulation and analysis, and even outperforms Python in some cases.
In this article, we will learn 5 different ways for filtering data in R, which is one of the most frequently done data wrangling operations. We filter data for two main reasons:
- Not all the data is needed for the task at hand
- Some part of the data is redundant, not useful, or just bad
How to filter data largely depends on the data type but methods can usually be used with different data types as we will see in the examples.
We will be using a sample dataset that I prepared with mock data. You can download it from my datasets repo. Let’s start with creating a data table from the “sales_data_with_stores” csv file.
library(data.table)
dt <- fread("sales_data_with_stores.csv")
# display the first 6 rows
head(df)
The dataset contains both numeric and textual columns. Before we start, let’s briefly…