Summary of data is the first thing a leader or senior stakeholder will look into. For example, whenever we see a dataset the first thought that comes to our mind is what is an average value, what is the count etc.
After reading this article, you will be able to
- Group data on some pre-defined criteria
- Calculate an aggregate value or summary based on the group
We'll use the songs dataset for all illustrations. You can download the song dataset by clicking here.
# read the dataset
Songs_DF <- read.csv("Hindi_Songs.csv")
Group by
In order to aggregate data, the first step is to group the data based on pre-defined criteria. The next step is to calculate the statistics for that group. If the data is numerical the statistics can be avg, sum, min, max, etc. If the data is non-numerical then statistics can be count, unique count, etc.
In this example, we'll use the group the data based on singer and calculate avg view, min view and max view for each singer. We'll also calculate the count of actors and unique actors worked with each singer.
Songs_DF %>%
group_by(Singer) %>%
summarise(Avg_views = mean(Views),
Min_view = min(Views),
Max_View = max(Views),
Actor_Count = length(Lead.Actor),
Unique_Actor = length(unique(Lead.Actor))) %>% View()
length(Lead.Actor)
function will calculate the length of lead actors while length(unique(Lead.Actor))
will calculate length of unique lead actors.
Output
Summary of Learning
- Groupby
- Summarize
%>%
operator