ChicagoCrimeDataset
srai.datasets.ChicagoCrimeDataset ¶
Bases: PointDataset
Chicago Crime dataset.
This dataset reflects reported incidents of crime (with the exception of murders where data exists for each victim) that occurred in the City of Chicago. Data is extracted from the Chicago Police Department's CLEAR (Citizen Law Enforcement Analysis and Reporting) system.
Source code in srai/datasets/chicago_crime.py
get_h3_with_labels ¶
get_h3_with_labels() -> (
tuple[
gpd.GeoDataFrame, Optional[gpd.GeoDataFrame], Optional[gpd.GeoDataFrame]
]
)
Returns h3 indexes with target labels from the dataset.
Points are aggregated to hexes and target column values are averaged or if target column is None, then the number of points is calculted within a hex and scaled to [0,1].
RETURNS | DESCRIPTION |
---|---|
tuple[GeoDataFrame, Optional[GeoDataFrame], Optional[GeoDataFrame]]
|
tuple[gpd.GeoDataFrame, Optional[gpd.GeoDataFrame], Optional[gpd.GeoDataFrame]]: Train, Val, Test hexes with target labels in GeoDataFrames |
Source code in srai/datasets/_base.py
load ¶
load(
version: Optional[Union[int, str]] = 9, hf_token: Optional[str] = None
) -> dict[str, gpd.GeoDataFrame]
Method to load dataset.
PARAMETER | DESCRIPTION |
---|---|
hf_token
|
If needed, a User Access Token needed to authenticate to
the Hugging Face Hub. Environment variable
TYPE:
|
version
|
version of a dataset. Available: Official spatial train-test split from year 2022 in chosen h3 resolution: '8', '9, '10'. Defaults to '9'. Raw data from other years available as: '2020', '2021', '2022'.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
dict[str, GeoDataFrame]
|
dict[str, gpd.GeoDataFrame]: Dictionary with all splits loaded from the dataset. Will contain keys "train" and "test" if available. |
Source code in srai/datasets/chicago_crime.py
train_test_split ¶
train_test_split(
target_column: Optional[str] = None,
resolution: Optional[int] = None,
test_size: float = 0.2,
n_bins: int = 7,
random_state: Optional[int] = None,
validation_split: bool = False,
force_split: bool = False,
task: Optional[str] = None,
) -> tuple[gpd.GeoDataFrame, gpd.GeoDataFrame]
Method to generate splits from GeoDataFrame, based on the target_column values.
PARAMETER | DESCRIPTION |
---|---|
target_column
|
Target column name. If None, split is generated based on number of points within a hex of a given resolution. Defaults to preset dataset target column.
TYPE:
|
resolution
|
h3 resolution to regionalize data. Defaults to default value from the dataset.
TYPE:
|
test_size
|
Percentage of test set. Defaults to 0.2.
TYPE:
|
n_bins
|
Bucket number used to stratify target data. Defaults to 7.
TYPE:
|
random_state
|
Controls the shuffling applied to the data before applying the split. Pass an int for reproducible output across multiple function. Defaults to None.
TYPE:
|
validation_split
|
If True, creates a validation split from existing train split and assigns it to self.val_gdf.
TYPE:
|
force_split
|
If True, forces a new split to be created, even if an existing train/test or validation split is already present.
- With
TYPE:
|
task
|
Currently not supported. Ignored in this subclass.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
tuple
|
Train-test or train-val split made on previous train subset.
TYPE:
|
Source code in srai/datasets/_base.py
202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 |
|