Analyzing crime in Chicago from 2012–2017 with Decision Trees, Logistic Models, and Random Forest Classifiers.

By: Carley Williams

Chicago Skyline

For this project, I used Kaggle’s dataset: Chicago Crimes 2012–2017. Before applying any models, I first needed to clean and explore my data.

Step 1: Exploring Data

To begin the exploration process, I followed many of the same steps I took in my previous publication from March 7th, “Analyzing Chicago’s Crime Rates”.

This dataset has 1456714 rows and 23 columns. These columns describe everything about each unique crime, designated by row. From latitude and longitude to FBI code to arrest, each of the 23 columns provides more description for the crime.

For use of my project, I narrowed things down to…


Taking a deep dive into Chicago’s crimes from 2012–2017

Carley Williams

PHASE 1: Exploring the shape & structure of the data

Step 1: Gaining a higher-level understanding of the data

To begin, I looked at the shape of my data. I see there are 1456714 rows and 23 columns. These columns left to right are Unnamed, ID, Case Number, Date, Block, IUCR, Primary Type, Description, Location Description, Arrest, Domestic. Beat, District, Ward, Community Area, FBI Code, X Coordinate, Y Coordinate, Year, Updated On, Latitude, Longitude, and Location.

Step 2: Finding data quality issues

ISSUE ONE: The first column was unnamed and contained random digits for each row. As there was already an index, I deleted this column.

ISSUE TWO: Near the right end of the…

Carley Williams

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store