A 311 Call Centre data can give invaluable insights into non-emergency municipal operations. From seeing where certain problems may arise (and how often), to directing workloads within a specific department in charge of responding to those calls -- managing and visualizing this data is essential.

In this example I take a look at the City of Windsor's calls from 2019-2020. Gathered from their Open Data Catalogue, I used the following workflow to use Python to extract the data, transform and clean it, and then load to a MySQL database running on AWS.

In Part 2, I'll use Tableau to connect to the database and create an interactive visualization the calls.

I'll step through the code blocks below with a brief discussion for each of the blocks.  (Or alternatively the Jupyter Notebook is here on my GitHub.)

Step One: Extract the Data from the Open Data Catalogue

Pandas will be the main workhorse here that will be used for compiling the structured tabular data, and Numpy for working with arrays.

import pandas as pd
import numpy as np

Next, the CSV files are loaded directly from the Open Data website, which consists of one file for 2019, as well as a separate file for the most current year (recent data available is only up to May 2020, unfortunately).

The Pandas .concat function is used to combine both CSV files into one dataframe named 'calldata.'

calldata = pd.concat(map(pd.read_csv, ['https://opendata.citywindsor.ca/Uploads/AllServiceRequests_YTD.csv', 
                                 'https://opendata.citywindsor.ca/Uploads/AllServiceRequests_2019.csv']), sort=False)

The .shape function shows that we are working with a total of 43,065 rows of individual calls, and the calldata dataframe is composed of 8 columns worth of attributes.

calldata.shape
(43065, 8)

Use the .head function to display the first 5 records in the dataframe. This comes back resembling an Excel file in a table format to get a glimpse of how the data looks. Looks like we have a description of the service request, what city department it was routed to, how the request was submitted, the date, address, street, ward and a column labeled sub-type.

calldata.head()
Service Request Description Department Method Received Created Date Block/Address Street Ward Sub-Type
0 Building Accessibility Inspections Phone FEB 11, 2020 04:12:51 PM 25## COLLEGE WARD 2
1 Building Accessibility Inspections Phone FEB 10, 2020 03:06:46 PM 3## CHATHAM WARD 3
2 Building Accessibility Inspections E-Mail JAN 08, 2020 09:31:11 AM 85## RIVERSIDE WARD 6
3 Building Accessibility Inspections Phone JAN 13, 2020 08:59:10 AM 5## OUELLETTE WARD 3
4 Building Accessibility Inspections Phone FEB 10, 2020 04:02:03 PM 49## WYANDOTTE WARD 6
calldata.dtypes
Service Request Description    object
Department                     object
Method Received                object
Created Date                   object
Block/Address                  object
Street                         object
Ward                           object
Sub-Type                       object
dtype: object

Step Two: Transform the Call Data in Preparation for Loading to Database.

Reviewing the first 5 head rows, as well as the data types, reveals a couple of issues that need to be addressed. I'll transform the data in the following code in anticipation of any conflicts that may arise when attempting to load it to a MySQL database.

First, 'Created Date' is stored as a string, and is also not in standard ANSI format of YYY-MM-DD HH:MM:SS. The .to_datetime Pandas function will standardize and format the Created Date column appropriately.

calldata['Created Date']= pd.to_datetime(calldata['Created Date'])

A quick way to check that the date range is reasonable and there are no outliers can be accomplished using the .min() and .max() functions to show the oldest and most recent dates in this dataset.

calldata['Created Date'].min()
Timestamp('2019-01-01 14:19:23')
Output
calldata['Created Date'].max()
Timestamp('2020-05-25 15:55:18')
Output

Next, the specific street address of each record as been somewhat redacted with '##' placeholder for caller privacy. I'll remove the pound signs and replace them with zeroes, which will leave us with an approximate street block location of the 311 call.

It would also be a good idea to rename the 'Block/Address' column. That forward slash in the name could become problematic, let's make it simply 'Block.'

calldata = calldata.rename(columns={'Block/Address': 'Block'})

calldata['Block'] = calldata.Block.str.replace('##', '00')

Recall earlier in the .head() display that there were numerous blank cells in the data, especially in the Sub-Type column. Using the Regular Expression (REGEX) and Numpy below I can remove whitespace characters that might be present and fill those empty cells with Not A Number (NaN) using:

calldata = calldata.replace(r'^\s*$', np.NaN, regex=True)

To get a handle on how many cells contain no information, the .sum() function for our NaN (null) per column can be calculated. This could be helpful for when considering to delete a column. Looks like 'Sub-Type' has 41,718 blank records and that's over 96.9% of the total dataset. Based on that, I'll drop this column entirely further on down in the code.

calldata.isnull().sum()
Service Request Description        0
Department                         0
Method Received                    0
Created Date                       0
Block                           4916
Street                          1334
Ward                              79
Sub-Type                       41718
dtype: int64
calldata['Method Received'].value_counts()

Just like seeing how many blanks are present, we can also get a quick summary of each variable present as well as how many of them make up the total count using .value_counts(). For the Method Received column, there are four attributes, with phone calls making up 90% of the way people contact 311, with only a trace of people choosing to use texting.

Phone         38716
E-Mail         2875
Web Intake     1338
Text            136
Name: Method Received, dtype: int64

Bylaw Enforcement gets the most calls assigned to them, while Facilities gets the least.

calldata['Department'].value_counts()
Bylaw Enforcement                          12139
Environmental Services                      9719
Contracts, Field Services & Maintenance     8084
Forestry and Natural Areas                  5795
Inspections                                 3289
Parking Enforcement                         1935
Parks Operations                            1493
Humane Society                               366
Traffic                                      162
Facilities                                    83
Name: Department, dtype: int64

Dirty Yards are the most frequent type of call, followed by tree requests and garbage not collected. Rodent concerns make up 7% of all the calls!

calldata['Service Request Description'].value_counts()
Dirty Yard                               7367
Tree Request                             5795
Garbage Not Collected                    2891
Rodent Extermination                     2835
Pothole                                  2360
Building Conditions                      1630
Snow & Ice Removal Issues - Sidewalks    1606
3-Day Parking Infraction                 1528
Parks Maintenance                        1390
Garbage Preparation                      1284
Snow Plowing & Salting Request           1117
Recycling Not Collected                  1081
Public Property Obstructions              997
Dog Complaint - Bylaw                     977
Sewer Maintenance                         929
Dead Animal Removal                       923
Yard Waste Not Collected                  785
Road Maintenance                          639
Road Clean-Up Request                     612
Building/Land - Improper Use              580
Noise                                     575
Abandoned Vehicle                         407
Sidewalk Construction & Repair            404
Downspout Disconnect                      362
Swimming Pool                             355
Alley Maintenance                         355
Street Sweeping                           328
Fence & Hedge                             300
Dead Animal Removal - Humane Society      291
Property Flooding / Grading               261
Snow Removal - Emergency                  226
Illegal Dumping Public Property           176
Parking Meter                             162
Road Cave-In                              161
Keeping of Animals                        154
Sewer Project Restoration                 136
Curb Maintenance                          122
Drainage                                  121
Protection of Parks                       109
Parks - Playground Issues                 100
Construction Site Conditions              100
Shoulder Maintenance                       97
Parks - Lighting                           83
Dog Complaint - Humane Society             75
Sewer Project                              58
Litter Bin - Request New                   42
Portable Sign Issue                        36
Litter Bin - Service                       34
Sign Issue                                 33
Snow & Ice - Dumping on Road/Alley         31
Building Accessibility                     29
Skunk Inspection                           12
Graffiti                                    3
Private Parking Lot                         1
Name: Service Request Description, dtype: int64
calldata['Sub-Type'].value_counts()
# drop this Sub-Type column because of the 41,492 white.
# Same problem with block/address, it has 4820 white spaces.

Sub-Type is a column that seems to have been abandoned as I found most cells were empty. Interesting to note that the Peace Fountain got 4 calls and I'm kinda shocked to see only 8 calls for Goose Poop Clean Up.

Maintenance/Other                                               552
Grass Cutting                                                   279
Garbage Bin Emptying                                            108
Snow Removal                                                     83
Fence Repair                                                     57
Trails                                                           38
Garbage Bin Requested                                            33
Grass Cutting,Maintenance/Other                                  24
Maintenance/Other,                                               22
Washrooms                                                        18
Bench Repair/Service                                             15
Maintenance/Other,Trails                                         10
Garbage Bin Emptying,Maintenance/Other                           10
Signage Repair                                                    9
Goose Poop Clean Up                                               8
Splash Pad                                                        7
Garbage Bin Requested,Maintenance/Other                           7
Garbage Bin Emptying,                                             6
Bench Repair/Service,Maintenance/Other                            5
Maintenance/Other,Signage Repair                                  5
Peace Fountain                                                    4
Garbage Bin Emptying,Garbage Bin Requested                        4
Maintenance/Other,Grass Cutting                                   4
Grass Cutting,                                                    4
Grass Cutting,Trails                                              2
Bike Repair Station                                               2
Fence Repair,Maintenance/Other                                    2
Maintenance/Other,Washrooms                                       2
Trails,Maintenance/Other                                          2
Front Yard,Back Yard                                              2
Maintenance/Other,Snow Removal                                    2
Garbage Bin Emptying,Grass Cutting                                2
Snow Removal,Trails                                               2
Fence Repair,Grass Cutting                                        2
Bench Repair/Service,Garbage Bin Requested,Maintenance/Other      1
Fence Repair,Trails                                               1
Sprinkler System                                                  1
Splash Pad,Maintenance/Other                                      1
Garbage Bin Emptying,Grass Cutting,Signage Repair,                1
Garbage Bin Requested,Grass Cutting,                              1
Bench Repair/Service,Maintenance/Other,Washrooms                  1
Fence Repair,                                                     1
Splash Pad,Grass Cutting,Maintenance/Other                        1
Middle of road                                                    1
Maintenance/Other,Garbage Bin Emptying                            1
Garbage Bin Emptying,Garbage Bin Requested,Maintenance/Other      1
Signage - New Request                                             1
Maintenance/Other,Fence Repair                                    1
Washrooms,Maintenance/Other                                       1
Name: Sub-Type, dtype: int64
calldata = calldata.drop(columns=['Sub-Type'])

Based on the fact that Sub-Type is mostly blank, it will get deleted with .drop().

And that wraps up the time consuming phase of the data transformation process.

Step Three: Load the Dataframe into an Existing MySQL Database.

I have a MySQL database running on an AWS instance that I'll load this data to right here in the same Jupyter Notebook. The plan is to use this same code to replicate the workflow when the next batch of 311 data is made available.

SQLAlchemy will be the library used to load the dataframe with the Python Jupyter Notebook. It's an ORM (Object Relational Mapper) and I'll use this library for managing connectivity to the database, interacting with database queries and results, and the construction of SQL statements to create, insert, update, and delete records.

SQLAlchemy will create the SQL table automatically, but if you wanted to see the longest string in each column to determine the minimum length of varchar SQL column requirements, the code below will provide that answer. It's not really needed, but I loaded Numpy earlier so might as well use it once more.

# Check for longest field lengths for SQL table creation activities.
columnLengths = np.vectorize(len)
maxlen = columnLengths(calldata.values.astype(str)).max(axis=0)
print('Max Column Lengths ', maxlen)
Max Column Lengths  [37 39 10 19  6 40  7]

First, initialize the engine to start up the database connectivity.

# import the module
from sqlalchemy import create_engine

Next the database and user credentials are assigned to the engine. I'll be connecting to my own AWS MySQL db using it's IP address, my admin user name, password, and a pre-existing database called windsor311.

# create sqlalchemy engine
engine = create_engine("mysql+pymysql://{user}:{pw}@ip.to.aws.db/{db}"
                       .format(user="db_user_name",
                               pw="db_password",
                               db="windsor311"))

The last step is to insert the rows of data from the calldata Pandas dataframe to the windsor 311 database. Again, no need to create a table first, just give it a name ('calltable') and it will be created by SQLAlchemy with reliance on Pandas .to_sql function.

# Insert whole CallData DataFrame into MySQL
# Will create a table called 'calltable' in the windsor311 
# database.

calldata.to_sql('calltable', con = engine, if_exists = 'append', chunksize = 500, index = False)

# when complete, close the db connection
connection.close()

Hop over to MySQL Workbench to check out the results. All looks good.

The data is now successfully loaded into MySQL. When new 311 call data is released I can simply point to their new CSV file in this existing code and the workflow to append the MySQL table will be totally automated.

In Part 2 of this article, I'll use this same database to create a visualization dashboard with Tableau.


Ken Hudak is a Data Analyst in Windsor, Ontario. Using Python, SQL, Altreyx, and ArcGIS, he creates business intelligence products from a myriad of data sources. More info about him can be found at KenHudak.com or on LinkedIn.