grouping and aggregation of data in R using RStudio

Summary of data is the first thing a leader or senior stakeholder will look into. For example, whenever we see a dataset the first thought that comes to our mind is what is an average value, what is the count etc.

After reading this article, you will be able to

Group data on some pre-defined criteria
Calculate an aggregate value or summary based on the group

We'll use the songs dataset for all illustrations. You can download the song dataset by clicking here.

#  read the dataset
Songs_DF <-  read.csv("Hindi_Songs.csv")

Group by

In order to aggregate data, the first step is to group the data based on pre-defined criteria. The next step is to calculate the statistics for that group. If the data is numerical the statistics can be avg, sum, min, max, etc. If the data is non-numerical then statistics can be count, unique count, etc.

In this example, we'll use the group the data based on singer and calculate avg view, min view and max view for each singer. We'll also calculate the count of actors and unique actors worked with each singer.

Songs_DF %>%  
  group_by(Singer) %>%
  summarise(Avg_views = mean(Views), 
            Min_view =  min(Views), 
            Max_View = max(Views), 
            Actor_Count = length(Lead.Actor), 
            Unique_Actor = length(unique(Lead.Actor))) %>% View()

length(Lead.Actor) function will calculate the length of lead actors while length(unique(Lead.Actor)) will calculate length of unique lead actors.

Output

Summary of Learning

Groupby
Summarize
%>% operator

[ Part 5] - grouping and aggregation of data in R using RStudio

Group by

Output

Summary of Learning