HuggingFaceDataset
srai.datasets.HuggingFaceDataset ¶
HuggingFaceDataset(
path: str,
version: Optional[str] = None,
type: Optional[str] = None,
numerical_columns: Optional[list[str]] = None,
categorical_columns: Optional[list[str]] = None,
target: Optional[str] = None,
resolution: Optional[int] = None,
)
Bases: ABC
Abstract class for HuggingFace datasets.
Source code in srai/datasets/_base.py
get_h3_with_labels ¶
abstractmethod
get_h3_with_labels() -> (
tuple[
gpd.GeoDataFrame, Optional[gpd.GeoDataFrame], Optional[gpd.GeoDataFrame]
]
)
Returns indexes with target labels from the dataset depending on dataset and task type.
RETURNS | DESCRIPTION |
---|---|
tuple[GeoDataFrame, Optional[GeoDataFrame], Optional[GeoDataFrame]]
|
tuple[gpd.GeoDataFrame, Optional[gpd.GeoDataFrame], Optional[gpd.GeoDataFrame]]: Train, Val, Test indexes with target labels in GeoDataFrames |
Source code in srai/datasets/_base.py
load ¶
load(
version: Optional[Union[int, str]] = None, hf_token: Optional[str] = None
) -> dict[str, gpd.GeoDataFrame]
Method to load dataset.
PARAMETER | DESCRIPTION |
---|---|
hf_token
|
If needed, a User Access Token needed to authenticate to
the Hugging Face Hub. Environment variable
TYPE:
|
version
|
version of a dataset
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
dict[str, GeoDataFrame]
|
dict[str, gpd.GeoDataFrame]: Dictionary with all splits loaded from the dataset. Will contain keys "train" and "test" if available. |
Source code in srai/datasets/_base.py
train_test_split ¶
abstractmethod
train_test_split(
target_column: Optional[str] = None,
resolution: Optional[int] = None,
test_size: float = 0.2,
n_bins: int = 7,
random_state: Optional[int] = None,
validation_split: bool = False,
force_split: bool = False,
task: Optional[str] = None,
) -> tuple[gpd.GeoDataFrame, gpd.GeoDataFrame]
Method to generate train/test or train/val split from GeoDataFrame.
PARAMETER | DESCRIPTION |
---|---|
target_column
|
Target column name for Points, trajectories id column fortrajectory datasets. Defaults to preset dataset target column.
TYPE:
|
resolution
|
H3 resolution, subclasses mayb use this argument to regionalize data. Defaults to default value from the dataset.
TYPE:
|
test_size
|
Percentage of test set. Defaults to 0.2.
TYPE:
|
n_bins
|
Bucket number used to stratify target data.
TYPE:
|
random_state
|
Controls the shuffling applied to the data before applying the split. Pass an int for reproducible output across multiple function. Defaults to None.
TYPE:
|
validation_split
|
If True, creates a validation split from existing train split and assigns it to self.val_gdf.
TYPE:
|
force_split
|
If True, forces a new split to be created, even if an existing train/test or validation split is already present.
- With
TYPE:
|
task
|
Task identifier. Subclasses may use this argument to determine stratification logic (e.g., by duration or spatial pattern). Defaults to None.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
tuple
|
Train-test or Train-val split made on previous train subset.
TYPE:
|