Data in the raw form might not be very useful. It should be transformed before it can be used. This process in which we convert raw data into useful format is called Data Wrangling. There are many available tools for data wrangling. One of the most popular is Pandas package in Python.
Loading Data
Loading data is the first step in the data wrangling process. Pandas provide lots of reader functions to read data into a dataframe. Few commonly used functions are
import pandas as pd
Song_DF = pd.read_excel("data//Hindi_Songs.xlsx", sheet_name="Songs")
pd.read_excel can take multiple arguments. Filename and sheetname are commonly used.
Data sanity check
Once data is loaded, a sanity check is an important step. This involves (but is not limited to)
looking at the data on a high level
getting the count of rows and columns
understand the type of data and null values
1. Data at a high level
Every dataframe has head() function to display top 5 records
Song_DF.head()
This dataset displays details of songs on Youtube like the name of the channel, song title, Movie, etc.
2. Dimension of data
shape property of a dataframe returns the dimension of data i.e. number of rows and columns.
Song_DF.shape
3. Understand the data types of columns and null values
Song_DF.info()