Chicago crime
Chicago Crime Dataset¶
This dataset reflects reported incidents of crime (with the exception of murders where data exists for each victim) that occurred in the City of Chicago. Data is extracted from the Chicago Police Department's CLEAR (Citizen Law Enforcement Analysis and Reporting) system. Each entry includes precise time and location details, supporting fine-grained spatial and temporal analysis. For benchmarking, a subset of reports from 2022 is used, while raw multi-year data is also provided without predefined train–test splits.
In [1]:
Copied!
# plotting imports
import contextily as cx
import numpy as np
import seaborn as sns
from matplotlib import pyplot as plt
from matplotlib.patches import Patch
# dataset import
from srai.datasets import ChicagoCrimeDataset
# plotting imports
import contextily as cx
import numpy as np
import seaborn as sns
from matplotlib import pyplot as plt
from matplotlib.patches import Patch
# dataset import
from srai.datasets import ChicagoCrimeDataset
In [2]:
Copied!
chicago_crime = ChicagoCrimeDataset()
chicago_crime = ChicagoCrimeDataset()
Load default data
In [3]:
Copied!
ds = chicago_crime.load()
ds.keys()
ds = chicago_crime.load()
ds.keys()
Out[3]:
dict_keys(['train', 'test'])
In [4]:
Copied!
type(chicago_crime.train_gdf), type(chicago_crime.test_gdf)
type(chicago_crime.train_gdf), type(chicago_crime.test_gdf)
Out[4]:
(geopandas.geodataframe.GeoDataFrame, geopandas.geodataframe.GeoDataFrame)
In [5]:
Copied!
print("Aggregation H3 resolution:", chicago_crime.resolution)
print("Aggregation H3 resolution:", chicago_crime.resolution)
Aggregation H3 resolution: 9
In [6]:
Copied!
print("Prediction target:", chicago_crime.target)
print("Prediction target:", chicago_crime.target)
Prediction target: count
In [7]:
Copied!
gdf_train, gdf_test = ds["train"], ds["test"]
gdf_train, gdf_test = ds["train"], ds["test"]
In [8]:
Copied!
gdf_train.head()
gdf_train.head()
Out[8]:
ID | Case Number | Date | Block | IUCR | Primary Type | Description | Location Description | Arrest | Domestic | Beat | District | Ward | Community Area | FBI Code | Year | Updated On | geometry | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 12589893 | JF109865 | 01/11/2022 03:00:00 PM | 087XX S KINGSTON AVE | 1565 | SEX OFFENSE | INDECENT SOLICITATION OF A CHILD | RESIDENCE | False | True | 423 | 4 | 7.0 | 46.0 | 17 | 2022 | 09/14/2023 03:41:59 PM | POINT (-87.56241 41.73641) |
1 | 12592454 | JF113025 | 01/14/2022 03:55:00 PM | 067XX S MORGAN ST | 2826 | OTHER OFFENSE | HARASSMENT BY ELECTRONIC MEANS | RESIDENCE | False | True | 724 | 7 | 16.0 | 68.0 | 26 | 2022 | 09/14/2023 03:41:59 PM | POINT (-87.64944 41.77178) |
2 | 12785595 | JF346553 | 08/05/2022 09:00:00 PM | 072XX S UNIVERSITY AVE | 1544 | SEX OFFENSE | SEXUAL EXPLOITATION OF A CHILD | APARTMENT | True | False | 324 | 3 | 5.0 | 69.0 | 17 | 2022 | 09/14/2023 03:41:59 PM | POINT (-87.597 41.76334) |
3 | 12808281 | JF373517 | 08/14/2022 02:00:00 PM | 055XX W ARDMORE AVE | 1562 | SEX OFFENSE | AGGRAVATED CRIMINAL SEXUAL ABUSE | RESIDENCE | False | False | 1621 | 16 | 39.0 | 11.0 | 17 | 2022 | 09/14/2023 03:41:59 PM | POINT (-87.7664 41.98588) |
4 | 12888104 | JF469015 | 11/10/2022 03:47:00 AM | 072XX S MAY ST | 1477 | WEAPONS VIOLATION | RECKLESS FIREARM DISCHARGE | STREET | False | False | 733 | 7 | 17.0 | 68.0 | 15 | 2022 | 09/14/2023 03:41:59 PM | POINT (-87.65284 41.76261) |
Getting target values for h3
In [9]:
Copied!
train_h3, _, test_h3 = chicago_crime.get_h3_with_labels()
train_h3, _, test_h3 = chicago_crime.get_h3_with_labels()
In [10]:
Copied!
train_h3.head()
train_h3.head()
Out[10]:
geometry | count | |
---|---|---|
region_id | ||
892664cc143ffff | POLYGON ((-87.61025 41.76848, -87.61206 41.767... | 0.104348 |
892664cce0bffff | POLYGON ((-87.61168 41.74644, -87.61349 41.745... | 0.094410 |
892664cb063ffff | POLYGON ((-87.80618 41.94159, -87.80799 41.940... | 0.034783 |
892664ca613ffff | POLYGON ((-87.73913 41.95981, -87.74095 41.958... | 0.007453 |
892664cdeb7ffff | POLYGON ((-87.67165 41.75393, -87.67346 41.752... | 0.099379 |
In [11]:
Copied!
test_h3.head()
test_h3.head()
Out[11]:
geometry | count | |
---|---|---|
region_id | ||
892664ccd27ffff | POLYGON ((-87.54947 41.7256, -87.55128 41.7244... | 0.008696 |
892664cb303ffff | POLYGON ((-87.80807 41.92763, -87.80989 41.926... | 0.003727 |
892664c8883ffff | POLYGON ((-87.72613 41.84962, -87.72795 41.848... | 0.047205 |
892664cc437ffff | POLYGON ((-87.60214 41.79698, -87.60395 41.795... | 0.027329 |
892664562d3ffff | POLYGON ((-87.66622 41.69198, -87.66803 41.690... | 0.027329 |
In [12]:
Copied!
fig, axes = plt.subplots(
2, 1, sharex=False, sharey=False, figsize=(12, 19), height_ratios=[4, 1]
)
train_h3.plot(
color="orange",
markersize=0.1,
ax=axes[0],
label="train",
alpha=np.minimum(np.power(train_h3[chicago_crime.target] + 0.4, 2), 1),
)
test_h3.plot(
color="royalblue",
markersize=0.1,
ax=axes[0],
label="test",
alpha=np.minimum(np.power(test_h3[chicago_crime.target] + 0.4, 2), 1),
)
cx.add_basemap(axes[0], source=cx.providers.CartoDB.PositronNoLabels, crs=4326, zoom=12)
axes[0].set_title("Chicago crime data aggregated to H3 cells")
axes[0].legend(
handles=[Patch(facecolor="orange"), Patch(facecolor="royalblue")],
labels=["Train", "Test"],
)
axes[0].set_axis_off()
sns.kdeplot(
x=train_h3[chicago_crime.target],
label="train",
color="orange",
ax=axes[1],
fill=False,
cut=0,
)
sns.kdeplot(
x=test_h3[chicago_crime.target],
label="test",
color="royalblue",
ax=axes[1],
fill=False,
cut=0,
)
axes[1].set_title("Chicago crime data - target distribution")
axes[1].legend()
fig.tight_layout()
plt.show()
fig, axes = plt.subplots(
2, 1, sharex=False, sharey=False, figsize=(12, 19), height_ratios=[4, 1]
)
train_h3.plot(
color="orange",
markersize=0.1,
ax=axes[0],
label="train",
alpha=np.minimum(np.power(train_h3[chicago_crime.target] + 0.4, 2), 1),
)
test_h3.plot(
color="royalblue",
markersize=0.1,
ax=axes[0],
label="test",
alpha=np.minimum(np.power(test_h3[chicago_crime.target] + 0.4, 2), 1),
)
cx.add_basemap(axes[0], source=cx.providers.CartoDB.PositronNoLabels, crs=4326, zoom=12)
axes[0].set_title("Chicago crime data aggregated to H3 cells")
axes[0].legend(
handles=[Patch(facecolor="orange"), Patch(facecolor="royalblue")],
labels=["Train", "Test"],
)
axes[0].set_axis_off()
sns.kdeplot(
x=train_h3[chicago_crime.target],
label="train",
color="orange",
ax=axes[1],
fill=False,
cut=0,
)
sns.kdeplot(
x=test_h3[chicago_crime.target],
label="test",
color="royalblue",
ax=axes[1],
fill=False,
cut=0,
)
axes[1].set_title("Chicago crime data - target distribution")
axes[1].legend()
fig.tight_layout()
plt.show()
Load data from 2022
In [13]:
Copied!
ds = chicago_crime.load(version="2022")
ds.keys()
ds = chicago_crime.load(version="2022")
ds.keys()
Out[13]:
dict_keys(['train'])
In [14]:
Copied!
type(chicago_crime.train_gdf), type(chicago_crime.test_gdf)
type(chicago_crime.train_gdf), type(chicago_crime.test_gdf)
Out[14]:
(geopandas.geodataframe.GeoDataFrame, NoneType)
In [15]:
Copied!
ds["train"].head()
ds["train"].head()
Out[15]:
ID | Case Number | Date | Block | IUCR | Primary Type | Description | Location Description | Arrest | Domestic | Beat | District | Ward | Community Area | FBI Code | Year | Updated On | geometry | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 12589893 | JF109865 | 01/11/2022 03:00:00 PM | 087XX S KINGSTON AVE | 1565 | SEX OFFENSE | INDECENT SOLICITATION OF A CHILD | RESIDENCE | False | True | 423 | 4 | 7.0 | 46.0 | 17 | 2022 | 09/14/2023 03:41:59 PM | POINT (-87.56241 41.73641) |
1 | 12592454 | JF113025 | 01/14/2022 03:55:00 PM | 067XX S MORGAN ST | 2826 | OTHER OFFENSE | HARASSMENT BY ELECTRONIC MEANS | RESIDENCE | False | True | 724 | 7 | 16.0 | 68.0 | 26 | 2022 | 09/14/2023 03:41:59 PM | POINT (-87.64944 41.77178) |
2 | 12785595 | JF346553 | 08/05/2022 09:00:00 PM | 072XX S UNIVERSITY AVE | 1544 | SEX OFFENSE | SEXUAL EXPLOITATION OF A CHILD | APARTMENT | True | False | 324 | 3 | 5.0 | 69.0 | 17 | 2022 | 09/14/2023 03:41:59 PM | POINT (-87.597 41.76334) |
3 | 12808281 | JF373517 | 08/14/2022 02:00:00 PM | 055XX W ARDMORE AVE | 1562 | SEX OFFENSE | AGGRAVATED CRIMINAL SEXUAL ABUSE | RESIDENCE | False | False | 1621 | 16 | 39.0 | 11.0 | 17 | 2022 | 09/14/2023 03:41:59 PM | POINT (-87.7664 41.98588) |
4 | 12888104 | JF469015 | 11/10/2022 03:47:00 AM | 072XX S MAY ST | 1477 | WEAPONS VIOLATION | RECKLESS FIREARM DISCHARGE | STREET | False | False | 733 | 7 | 17.0 | 68.0 | 15 | 2022 | 09/14/2023 03:41:59 PM | POINT (-87.65284 41.76261) |
Create your own train-test split -> Spatial splitting with bucket stratification
In [16]:
Copied!
train, test = chicago_crime.train_test_split(
test_size=0.2, random_state=42, n_bins=10, resolution=9
)
train, test = chicago_crime.train_test_split(
test_size=0.2, random_state=42, n_bins=10, resolution=9
)
Summary of the split: Train: 3886 H3 cells (187991 points) Test: 1075 H3 cells (46928 points) Expected ratios: {'train': 0.8, 'validation': 0, 'test': 0.2} Actual ratios: {'train': 0.8, 'test': 0.2} Actual ratios difference: {'train': 0.0, 'test': 0.0} bucket train_ratio test_ratio train_ratio_difference \ 0 0 0.80017 0.19983 -0.00017 1 1 0.79926 0.20074 0.00074 2 2 0.80101 0.19899 -0.00101 3 3 0.80010 0.19990 -0.00010 4 4 0.79906 0.20094 0.00094 5 5 0.79969 0.20031 0.00031 6 6 0.79991 0.20009 0.00009 7 7 0.80072 0.19928 -0.00072 8 8 0.80025 0.19975 -0.00025 9 9 0.80046 0.19954 -0.00046 test_ratio_difference train_points test_points 0 0.00017 965 241 1 -0.00074 3038 763 2 0.00101 5531 1374 3 0.00010 7825 1955 4 -0.00094 11095 2790 5 -0.00031 14388 3604 6 -0.00009 19777 4947 7 0.00072 25084 6243 8 0.00025 34607 8638 9 0.00046 65681 16373 Created new train_gdf and test_gdf. Train len: 187991,test len: 46928
In [17]:
Copied!
type(chicago_crime.train_gdf), type(chicago_crime.test_gdf)
type(chicago_crime.train_gdf), type(chicago_crime.test_gdf)
Out[17]:
(geopandas.geodataframe.GeoDataFrame, geopandas.geodataframe.GeoDataFrame)
In [18]:
Copied!
chicago_crime.resolution
chicago_crime.resolution
Out[18]:
9
In [19]:
Copied!
train.head()
train.head()
Out[19]:
ID | Case Number | Date | Block | IUCR | Primary Type | Description | Location Description | Arrest | Domestic | Beat | District | Ward | Community Area | FBI Code | Year | Updated On | geometry | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 12589893 | JF109865 | 01/11/2022 03:00:00 PM | 087XX S KINGSTON AVE | 1565 | SEX OFFENSE | INDECENT SOLICITATION OF A CHILD | RESIDENCE | False | True | 423 | 4 | 7.0 | 46.0 | 17 | 2022 | 09/14/2023 03:41:59 PM | POINT (-87.56241 41.73641) |
1 | 12592454 | JF113025 | 01/14/2022 03:55:00 PM | 067XX S MORGAN ST | 2826 | OTHER OFFENSE | HARASSMENT BY ELECTRONIC MEANS | RESIDENCE | False | True | 724 | 7 | 16.0 | 68.0 | 26 | 2022 | 09/14/2023 03:41:59 PM | POINT (-87.64944 41.77178) |
2 | 12785595 | JF346553 | 08/05/2022 09:00:00 PM | 072XX S UNIVERSITY AVE | 1544 | SEX OFFENSE | SEXUAL EXPLOITATION OF A CHILD | APARTMENT | True | False | 324 | 3 | 5.0 | 69.0 | 17 | 2022 | 09/14/2023 03:41:59 PM | POINT (-87.597 41.76334) |
3 | 12808281 | JF373517 | 08/14/2022 02:00:00 PM | 055XX W ARDMORE AVE | 1562 | SEX OFFENSE | AGGRAVATED CRIMINAL SEXUAL ABUSE | RESIDENCE | False | False | 1621 | 16 | 39.0 | 11.0 | 17 | 2022 | 09/14/2023 03:41:59 PM | POINT (-87.7664 41.98588) |
4 | 12888104 | JF469015 | 11/10/2022 03:47:00 AM | 072XX S MAY ST | 1477 | WEAPONS VIOLATION | RECKLESS FIREARM DISCHARGE | STREET | False | False | 733 | 7 | 17.0 | 68.0 | 15 | 2022 | 09/14/2023 03:41:59 PM | POINT (-87.65284 41.76261) |
In [20]:
Copied!
test.head()
test.head()
Out[20]:
ID | Case Number | Date | Block | IUCR | Primary Type | Description | Location Description | Arrest | Domestic | Beat | District | Ward | Community Area | FBI Code | Year | Updated On | geometry | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
20 | 12605225 | JF128458 | 01/30/2022 05:30:00 AM | 007XX W COUCH PL | 0486 | BATTERY | DOMESTIC BATTERY SIMPLE | APARTMENT | True | True | 1224 | 12 | 27.0 | 28.0 | 08B | 2022 | 09/16/2023 03:41:56 PM | POINT (-87.64682 41.88511) |
22 | 12622840 | JF150396 | 02/20/2022 10:45:00 AM | 010XX W ROSCOE ST | 0486 | BATTERY | DOMESTIC BATTERY SIMPLE | APARTMENT | True | True | 1924 | 19 | 44.0 | 6.0 | 08B | 2022 | 09/16/2023 03:41:56 PM | POINT (-87.65582 41.94354) |
26 | 12812832 | JF378884 | 06/23/2022 05:35:00 PM | 026XX N MOODY AVE | 0281 | CRIMINAL SEXUAL ASSAULT | NON-AGGRAVATED | APARTMENT | False | True | 2512 | 25 | 36.0 | 19.0 | 02 | 2022 | 09/16/2023 03:41:56 PM | POINT (-87.77952 41.92824) |
35 | 12628255 | JF157037 | 02/26/2022 05:45:00 PM | 002XX W 24TH PL | 0326 | ROBBERY | AGGRAVATED VEHICULAR HIJACKING | STREET | False | False | 914 | 9 | 11.0 | 34.0 | 03 | 2022 | 10/03/2023 03:41:27 PM | POINT (-87.63357 41.84823) |
50 | 12646302 | JF175150 | 03/11/2022 02:10:00 PM | 105XX S MORGAN ST | 0266 | CRIMINAL SEXUAL ASSAULT | PREDATORY | STREET | False | False | 2232 | 22 | 21.0 | 73.0 | 02 | 2022 | 09/29/2023 03:41:29 PM | POINT (-87.64756 41.7024) |
In [ ]:
Copied!