[part 7] - working with DateTime

[part 7] - working with DateTime

One skill that every analytics professional must have is the ability to work with dates and times. In this article, I'll explain the fundamentals of working with datetime data in R. I took five-year power consumption data to illustrate this concept. You can download it by clicking here.

The dataset has two columns

  • Datetime of consumption

  • Consumption in kWh

Loading the data and high level data sanity check

setwd("E:/data literacy/data wrangling") # to change working directory
Power_DF = read.csv("power_usage_2016_to_2020.csv")

image.png

> str(Power_DF)
'data.frame':    35952 obs. of  2 variables:
 $ StartDate  : chr  "06-01-2016 00:00" "06-01-2016 01:00" "06-01-2016 02:00" "06-01-2016 03:00" ...
 $ Value..kWh.: num  1.057 1.171 0.56 0.828 0.932 ...

Here the StartDate column is showing as character datatype, therefore we have to convert it into DateTime datatype to perform DateTime operations.

String to DateTime

strptime() function is R is widely used to convert a string into DateTime datatype. It takes two arguments

  1. text to convert
  2. format of the data in the string Format use the below list of abbreviations to understand the string
  • %d Day of the month as decimal number (01-31)
  • %m: Month as a decimal number (01--12)
  • %y: Year without century (00-99)
  • %Y: Year with century
  • %H: Hours as an integer number (00-23)
  • %M: Minute as an integer number (00--59)
  • %S: Second as an integer (00-59)
  • %b: Abbreviated month name
  • %B: Full month name
  • %a: Day of week
sample_string <- "2013-12-25 04:32:16"
date_value <- strptime (sample_string, format="%Y-%m-%d %H:%M:%S")
print(sample_string)
[1] "2013-12-25 04:32:16"
print(date_value)
[1] "2013-12-25 04:32:16 IST"
class(sample_string)
[1] "character"
class(date_value)
[1] "POSIXlt" "POSIXt"

In the Power dataset, the format of StartDate is %d-%m-%Y %H:%M. There are no seconds. We'll create a new column Date_Time in DateTime format.

Power_DF$Date_Time <- strptime(Power_DF$StartDate, format = "%d-%m-%Y %H:%M")
str(Power_DF)
'data.frame':    35952 obs. of  3 variables:
 $ StartDate  : chr  "06-01-2016 00:00" "06-01-2016 01:00" "06-01-2016 02:00" "06-01-2016 03:00" ...
 $ Value..kWh.: num  1.057 1.171 0.56 0.828 0.932 ...
 $ Date_Time  : POSIXlt, format: "2016-01-06 00:00:00" "2016-01-06 01:00:00" ...

A new column Date_Time is created in the format POSIXlt which is DateTime format in R.

Create new columns for analysis

Now, since the data is in DateTime format we can extract Day, Month, Yeat, Hour, Minute from the Date_Time column.

Power_DF$Day <- as.integer(format(Power_DF$Date_Time, "%d"))
Power_DF$DayOfWeek <- format(Power_DF$Date_Time, "%a")
Power_DF$Month <- as.integer(format(Power_DF$Date_Time, "%m"))
Power_DF$MonthName <- format(Power_DF$Date_Time, "%b")
Power_DF$Year <- as.integer(format(Power_DF$Date_Time, "%Y"))
Power_DF$Hour <- as.integer(format(Power_DF$Date_Time, "%H"))

Sample output with new columns image.png

Now, extensive analysis can be done on this data set like

  • Calculate average consumption on each day of the week
  • Calculate average, minimum, and maximum consumption on an hourly basis

Learning

  1. Convert string data into DateTime format
  2. Extract additional information from the DateTime column