Call Center Data Sets: Prepare data, clean dirty data and perform analysis with SQL
Tools we used: MySQL Database
Language Used: SQL
Skills we applied: Data format, Data category, Working with dates, Handling empty values, Remove duplicates, Preparing data, Cleaning dirty data, Use count, percentage, group by, order by functions, Using min, max and avg functions etc
Clients Expectations:
- Prepare data for analysis
- Find reason persentage
- Find which day has maximum calls
- Find min, max and average call durations
- Analyze customers sentiments
- Analyze call centers calls number which are within, below and above the service
Data Set Format: CSV
Dataset & Code Download link:
Step 1:
Let's check our data table. We limit the results to 10 rows.
Code:
Output:
We have 12 column / data fields. The problems we see are:
- The date format under "call_timestamp" is mm/dd/yyyy but in SQl we need yyyy/mm/dd date format.
- In 11th column , the name of column contain spaces. We need to rename that.
- In 4th column there are some blank values. We need to fix that too.
Step 2:
Let's check the data types now...
Code:
Output:
The problems we found are:
- The csat_score is "int" type but here the data type is "varchar"
- "call_timestamp" is a date but here the data type is varchar we need to fix that
Step 3:
Let's convert the date format first
Code:
Let's check our data table. We limit the results to 10 rows.
Code:
Output:
- The format is now yyyy/mm/dd
Step 4:
We are going to fix the data types issues. We have to change the data types as date for "call_timestamp" and "csat_score" as int.
Code:
Let's check the data types again:
Output:
- All data types are correct now
Step 5:
Let's rename the 11th column name "call duration in minutes" because the name contains space
Code:
- The new name of the column is "CDinMin"
Step 6:
Now we are going to fix the black values in 4th column. We will replace those blanks with "NULL"
Code:
Let's check our data table. We limit the results to 10 rows.
Output:
- The blank values are replaced with "NULL"
Step 7:
Let's check for the duplicates values. First of all we will check the total row number
Code:
Output:
We have 32941 rows in our data table, Now lets check the unique row number for "id" and "customer_name"
Code:
Output:
- Both of them have 32941 rows, so we don't have any duplicate values.
Our data preparation is ready for analysis. Let's start our analysis now.
Step 8:
The operators called the customers for various reason, So let's count how many calls the made for each reason. We will calculate the percentage too.
Code:
Output:
- So we got the heights percentage for "Billing Question" reason. it's 71.2%
Step 9:
Let's find out which day has the most calls?
Code:
Output:
- Friday has most of the calls
Step 10:
Find out the minimum, maximum and average call duration.
Code:
Output:
- We got minimum duration 5min, maximum duration 45min and average duration 25 min
Step 11:
Now we are going to analyze the customers sentiments.
Code:
Output:
- We found that the negative sentiments are high comparing to the positive sentiments
Step 12:
Checking how many call are within, below or above the service for each call center.
Code:
Output:
- We found the number of calls which are within, below or above the service for each call center