By: Carley Williams
For this project, I used Kaggle’s dataset: Chicago Crimes 2012–2017. Before applying any models, I first needed to clean and explore my data.
To begin the exploration process, I followed many of the same steps I took in my previous publication from March 7th, “Analyzing Chicago’s Crime Rates”.
This dataset has 1456714 rows and 23 columns. These columns describe everything about each unique crime, designated by row. From latitude and longitude to FBI code to arrest, each of the 23 columns provides more description for the crime.
For use of my project, I narrowed things down to…
Taking a deep dive into Chicago’s crimes from 2012–2017
To begin, I looked at the shape of my data. I see there are 1456714 rows and 23 columns. These columns left to right are Unnamed, ID, Case Number, Date, Block, IUCR, Primary Type, Description, Location Description, Arrest, Domestic. Beat, District, Ward, Community Area, FBI Code, X Coordinate, Y Coordinate, Year, Updated On, Latitude, Longitude, and Location.
ISSUE ONE: The first column was unnamed and contained random digits for each row. As there was already an index, I deleted this column.
ISSUE TWO: Near the right end of the…