How to Load excel data and perform sanity check in pandas?

How to Load excel data and perform sanity check in pandas?

Data in the raw form might not be very useful. It should be transformed before it can be used. This process in which we convert raw data into useful format is called Data Wrangling. There are many available tools for data wrangling. One of the most popular is Pandas package in Python.

Loading Data

Loading data is the first step in the data wrangling process. Pandas provide lots of reader functions to read data into a dataframe. Few commonly used functions are image.png

import pandas as pd
Song_DF = pd.read_excel("data//Hindi_Songs.xlsx", sheet_name="Songs")

pd.read_excel can take multiple arguments. Filename and sheetname are commonly used.

Data sanity check

Once data is loaded, a sanity check is an important step. This involves (but is not limited to)

  • looking at the data on a high level

  • getting the count of rows and columns

  • understand the type of data and null values

1. Data at a high level

Every dataframe has head() function to display top 5 records

Song_DF.head()

image.png

This dataset displays details of songs on Youtube like the name of the channel, song title, Movie, etc.

2. Dimension of data

shape property of a dataframe returns the dimension of data i.e. number of rows and columns.

Song_DF.shape

image.png

3. Understand the data types of columns and null values

Song_DF.info()

Info.png