[Part 3]  - load packages and data in R using RStudio

[Part 3] - load packages and data in R using RStudio

After reading this article, you will be able to

  • Install and load the required packages.
  • Load the data for analysis.
  • Perform high-level data sanity check.

What are packages in R?

R packages are collections of functions and data. They are created for a specific task. There is a base package that comes by default with installation. This package contains the basic functions which let R work as a language.

One of the most common packages in R for data analysis is tidyverse. We need to install this package before using it. Since it is a one-time activity, we can directly run it into the console. The command to install tidyverse is

# install package tidyverse 
install.packages("tidyverse")

After installation, we need to load the packages for our use. The command to load package is

#  Load recently install tidyverse for data analysis
library(tidyverse)

About the dataset

I've created a songs dataset from youtube. It contains songs of three popular singers Kishore Kumar, Mohammad Rafi, and Mukesh, along with details like the name of the youtube channel, Title of the song, Name of the Movie, Name of the Singer, Lead Actor in the Song, View count on youtube, number of likes and number of comments. We'll use this dataset.

Reading data

Loading data is the first step in the data wrangling process. R has multiple functions to read data. read.csv() is one of them and is used to read csv file

Before we load data for analysis, we can instruct R to change the working directory that will be used to load and store data.

getwd() # to know the current working directory
setwd("E:/data literacy/data wrangling") to change working directory

# to load data
Songs_DF = read.csv("Hindi_Songs.csv")

The data object will now appear on the Environment window. It will show the number of records and columns. Dataframe has 78 rows and 8 columns.png

Data santity check

Data can be looked at a high level by executing the view() command.

View(Songs_DF)

View_Output.png

str() function is used to understand the data type. This will get an idea of any transformation that is required.

str(Songs_DF)

str() Output.png

Summary of learning

  1. Install and load libraries
  2. Load data
  3. Data sanity check