Seattle Crime Analysis

Exploratory data analysis using R programming

View the Project on GitHub sumeet144/CrimeIncidents

About Project

The dataset for the exploratory data analysis is sourced from data.seattle.gov and downloaded to a csv file to perform analysis using R. The dataset contains the details about crime incidents in Seattle recorded by Seattle police officers when responding to incidents around the city. The information recorded by police officers is entered into Records Management System (RMS) of Seattle police department and is, then, made available on data.seattle.gov website for public. The dataset contains all the incidents reported from 1965 to the month of April in 2015. The csv file contains 570,645 records having below variables:

Data Cleaning

I cleaned the dataset for further analysis by typecasting the variables to appropriate data type such as changed latitude and longitude variables to numeric from character and changed date variable to date format. Dropped repeated variables to tidy the dataset for efficient and quick analysis.

Exploratory Data Analysis

The following charts are drawn to explore possible relationships or patterns in the data.

(a) Line Plot – The below visualization presents the frequency for crimes by time of the day. There are approximately 61 categories of crime incidents in the dataset. I took a subset and prepared this visualization for top three crime categories – Car Prowl, Other Property, and Burglary. Using the below R code, I divided the day into 6 hour intervals each for night, morning, midday, and evening. I, then, counted the number of crimes occurred in those time intervals for Car Prowl, Other Property, and Burglary from 1965 to the month of April 2015 in the dataset. The ‘table2’ below shows the number. This type of visualization will enable patterns, showing the top 3 crimes committed over different intervals of the day from 1965 to 2015.

crime.seattle <- read.csv(file="crime.csv", head=TRUE, sep=",")
crime.seattle$date <- strptime(crime.seattle$crime.time, 
                               format = "%m/%d/%Y %H:%M")
crime.seattle$time <- substr(crime.seattle$date, 12,13)
crime.seattle$time[crime.seattle$time %in% c("00", "01", "02",
                                             "03", "22", "23", 
                                             "24")] <- "night"
crime.seattle$time[crime.seattle$time %in% c("04", "05", "06",
                                             "07", "08", "09")] 
                                             <- "morning"
crime.seattle$time[crime.seattle$time %in% c("10", "11", "12",
                                             "13", "14", "15")] 
                                             <- "midday"
crime.seattle$time[crime.seattle$time %in% c("16", "17", "18",
                                             "19", "20", "21")] 
                                             <- "evening"
time.table <- table(crime.seattle$time,crime.seattle$crime.desc)
time <- factor(rownames(time.table), 
               levels = c("morning", "midday", 
                          "evening", "night"))
sorted.time.table <- time.table[order(time), ]
table1 <- sorted.time.table[ , c("CAR PROWL", "BURGLARY", 
                                 "OTHER PROPERTY")]
category <- c("morning", "midday", "evening", "night")
matplot(table1, type = "l", main = "Thefts by Time of Day", 
                xlab = "Section of Day",ylab = "Frequency", 
                xaxt = "n")
v1 <- c(1, 2, 3, 4)
axis(side = 1, at = v1, labels = category)

Top 3 crimes by day

(b) Map Plot – This type of visualization will enable patterns to be drawn on a map. The map visualization enables representation of different crime categories and their concentration on a city map. By taking location values in the dataset and using google map libraries, R code can be used to plot and visualize different crime description code on a Seattle map.

crime.sample <- crime.seattle[2:1000, 15:16]
latitude <- crime.sample$latitude
longitude <- crime.sample$longitude
basemap <- get_map(location='Seattle, USA', zoom = 11, 
                   maptype='roadmap', color='bw', source='google')
map1 <- ggmap(basemap, extent='panel', 
              base_layer=ggplot(crime.sample, aes(x=longitude, 
                                                  y=latitude)))
map.seattle <- map1 + geom_point(color = "blue", size = 4)
map.seattle <- map.seattle + labs(title="Seattle Crime Area", 
                                  x="Longitude", y="Latitude")
map.seattle <- map.seattle + 
               theme(plot.title = element_text(hjust = 0, 
                                               vjust = 1, 
                                               face = c("bold")))
crime.desc <- crime.seattle[2:1000, 6]
map.survey <- map1 + geom_point(aes(color = crime.desc), 
                                size = 4, alpha = .8)
map.survey <- map.survey + labs(title="Seattle Crime Area Map", 
                                x="Longitude", y="Latitude", 
                                color="Crime Desc Code")
map.survey

(c) Bar Chart - The bar chart is generated to explore the frequency of the crime incidents. This shows what are the top crimes reported or occurred in the Seattle area.

barplot(sort(table(crime.seattle$crime.desc), decreasing=TRUE), 
        main="Seattle Crime Incidents Numbers",
        xlab="Crime", ylab="Number of times crime committed")

(d) Pie Chart – The following pie chart is generated using the below R code. This visualization illustrates the distribution of different types of crimes in Seattle area. The pie chart shows the proportion of different categories in the given dataset. Even though this visualization presents useful information and shows the proportion for highest committed crime categories such as ‘Car Prowl’, ‘Other Property’, and ‘Burglary’ in the dataset but other categories which have less proportion are not clearly visible for comparison or recording the observations.

crime.seattle <- read.csv(file="crime.csv", head=TRUE, sep=",")
crime.freq <- table(crime.seattle$crime.desc)
crime.freq.desc <- sort(crime.freq, decreasing = TRUE)
names(crime.freq.desc)[6:35] <- rep("", times = 30)
percentlabels <- round(100*crime.freq.desc/sum(crime.freq.desc),1)
chartlabels <- paste(names(crime.freq.desc)," ", 
                           percentlabels, "%", sep="")
pie(crime.freq.desc,main = "Types of crime in Seattle", 
                    col = rainbow(10), labels=chartlabels)

Results

The final visualization selected to represent this dataset is the line plot. This is for the following reasons: