Index
Hex2Vec.
¶
Bases: CountEmbedder
Hex2Vec Embedder.
PARAMETER | DESCRIPTION |
---|---|
encoder_sizes |
Sizes of the encoder layers. The input layer size shouldn't be included - it's inferred from the data. The last element is the embedding size. Defaults to [150, 75, 50].
TYPE:
|
Source code in srai/embedders/hex2vec/embedder.py
¶
Create region embeddings.
PARAMETER | DESCRIPTION |
---|---|
regions_gdf |
Region indexes and geometries.
TYPE:
|
features_gdf |
Feature indexes, geometries and feature values.
TYPE:
|
joint_gdf |
Joiner result with region-feature multi-index.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
DataFrame
|
pd.DataFrame: Embedding and geometry index for each region in regions_gdf. |
RAISES | DESCRIPTION |
---|---|
ValueError
|
If features_gdf is empty and self.expected_output_features is not set. |
ValueError
|
If any of the gdfs index names is None. |
ValueError
|
If joint_gdf.index is not of type pd.MultiIndex or doesn't have 2 levels. |
ValueError
|
If index levels in gdfs don't overlap correctly. |
Source code in srai/embedders/hex2vec/embedder.py
fit(
regions_gdf,
features_gdf,
joint_gdf,
neighbourhood,
negative_sample_k_distance=2,
batch_size=32,
learning_rate=0.001,
trainer_kwargs=None,
)
¶
fit(
regions_gdf,
features_gdf,
joint_gdf,
neighbourhood,
negative_sample_k_distance=2,
batch_size=32,
learning_rate=0.001,
trainer_kwargs=None,
)
Fit the model to the data.
PARAMETER | DESCRIPTION |
---|---|
regions_gdf |
Region indexes and geometries.
TYPE:
|
features_gdf |
Feature indexes, geometries and feature values.
TYPE:
|
joint_gdf |
Joiner result with region-feature multi-index.
TYPE:
|
neighbourhood |
The neighbourhood to use. Should be intialized with the same regions.
TYPE:
|
negative_sample_k_distance |
When sampling negative samples, sample from a distance > k. Defaults to 2.
TYPE:
|
batch_size |
Batch size. Defaults to 32.
TYPE:
|
learning_rate |
Learning rate. Defaults to 0.001.
TYPE:
|
trainer_kwargs |
Trainer kwargs. Defaults to None.
TYPE:
|
RAISES | DESCRIPTION |
---|---|
ValueError
|
If features_gdf is empty and self.expected_output_features is not set. |
ValueError
|
If any of the gdfs index names is None. |
ValueError
|
If joint_gdf.index is not of type pd.MultiIndex or doesn't have 2 levels. |
ValueError
|
If index levels in gdfs don't overlap correctly. |
ValueError
|
If negative_sample_k_distance < 2. |
Source code in srai/embedders/hex2vec/embedder.py
fit_transform(
regions_gdf,
features_gdf,
joint_gdf,
neighbourhood,
negative_sample_k_distance=2,
batch_size=32,
learning_rate=0.001,
trainer_kwargs=None,
)
¶
fit_transform(
regions_gdf,
features_gdf,
joint_gdf,
neighbourhood,
negative_sample_k_distance=2,
batch_size=32,
learning_rate=0.001,
trainer_kwargs=None,
)
Fit the model to the data and return the embeddings.
PARAMETER | DESCRIPTION |
---|---|
regions_gdf |
Region indexes and geometries.
TYPE:
|
features_gdf |
Feature indexes, geometries and feature values.
TYPE:
|
joint_gdf |
Joiner result with region-feature multi-index.
TYPE:
|
neighbourhood |
The neighbourhood to use. Should be intialized with the same regions.
TYPE:
|
negative_sample_k_distance |
When sampling negative samples, sample from a distance > k. Defaults to 2.
TYPE:
|
batch_size |
Batch size. Defaults to 32.
TYPE:
|
learning_rate |
Learning rate. Defaults to 0.001.
TYPE:
|
trainer_kwargs |
Trainer kwargs. Defaults to None.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
DataFrame
|
pd.DataFrame: Region embeddings. |
RAISES | DESCRIPTION |
---|---|
ValueError
|
If features_gdf is empty and self.expected_output_features is not set. |
ValueError
|
If any of the gdfs index names is None. |
ValueError
|
If joint_gdf.index is not of type pd.MultiIndex or doesn't have 2 levels. |
ValueError
|
If index levels in gdfs don't overlap correctly. |
ValueError
|
If negative_sample_k_distance < 2. |
Source code in srai/embedders/hex2vec/embedder.py
¶
Save the model to a directory.
PARAMETER | DESCRIPTION |
---|---|
path |
Path to the directory.
TYPE:
|
Source code in srai/embedders/hex2vec/embedder.py
¶
classmethod
Load the model from a directory.
PARAMETER | DESCRIPTION |
---|---|
path |
Path to the directory.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Hex2VecEmbedder
|
The loaded embedder.
TYPE:
|
Source code in srai/embedders/hex2vec/embedder.py
¶
Bases: Model
Hex2Vec embedding model.
This class implements the embedding model from Hex2Vec paper. It is based on a skip-gram model with negative sampling and triplet-loss. The model takes vectors of numbers as input (raw counts of features) per region and outputs dense embeddings.
PARAMETER | DESCRIPTION |
---|---|
layer_sizes |
List of sizes for the model layers. The first element is the input size (number of features), the last element is the output (embedding) size.
TYPE:
|
learning_rate |
Learning rate. Defaults to 0.001.
TYPE:
|
RAISES | DESCRIPTION |
---|---|
ValueError
|
If layer_sizes contains less than 2 elements. |
Source code in srai/embedders/hex2vec/model.py
¶
Get model config.
Source code in srai/embedders/_base.py
¶
Save the model to a directory.
PARAMETER | DESCRIPTION |
---|---|
path |
Path to the directory.
TYPE:
|
¶
classmethod
Load model from a file.
PARAMETER | DESCRIPTION |
---|---|
path |
Path to the file.
TYPE:
|
**kwargs |
Additional kwargs to pass to the model constructor.
TYPE:
|
Source code in srai/embedders/_base.py
¶
Calculate embedding for a region.
PARAMETER | DESCRIPTION |
---|---|
X_anchor |
Region features.
TYPE:
|
¶
Predict the probability of X_anchor being neighbours with X_context.
X_anchor and X_context are assumed to have the same batch size. The probabilities are calculated in pairs, i.e. the first element of X_anchor is compared with the first element of X_context.
PARAMETER | DESCRIPTION |
---|---|
X_anchor |
Anchor regions.
TYPE:
|
X_context |
Context regions.
TYPE:
|
Source code in srai/embedders/hex2vec/model.py
¶
Predict raw unnormalized scores of X_anchor being neighbours with X_context.
X_anchor and X_context are assumed to have the same batch size. The scores are calculated in pairs, i.e. the first element of X_anchor is compared with the first element of X_context. In order to get probabilities, use the sigmoid function.
PARAMETER | DESCRIPTION |
---|---|
X_anchor |
Anchor regions.
TYPE:
|
X_context |
Context regions.
TYPE:
|
Source code in srai/embedders/hex2vec/model.py
¶
Perform one training step.
One batch of data consists of 3 tensors
- X_anchor: Anchor regions.
- X_positive: Positive regions. The regions assumed to be neighbours of the corresponding regions in X_anchor.
- X_negative: Negative regions. The regions assumed to NOT be neighbours of the corresponding regions in X_anchor.
The regions in X_anchor, X_positive and X_negative are first embedded using the encoder.
After that, the dot product of the corresponding embeddings is calculated.
The loss is calculated as a binary cross-entropy between the dot product and the labels.
PARAMETER | DESCRIPTION |
---|---|
batch |
Batch of data.
TYPE:
|
batch_idx |
Batch index.
TYPE:
|
Source code in srai/embedders/hex2vec/model.py
¶
Perform one validation step.
PARAMETER | DESCRIPTION |
---|---|
batch |
Batch of data.
TYPE:
|
batch_idx |
Batch index.
TYPE:
|
Source code in srai/embedders/hex2vec/model.py
¶
Bases: Dataset[NeighbourDatasetItem]
, Generic[T]
Dataset for training a model to predict neighbours.
It works by returning triplets of regions: anchor, positive and negative. A model can be trained to predict that the anchor region is a neighbour of the positive region, and that it is not a neighbour of the negative region.
PARAMETER | DESCRIPTION |
---|---|
data |
Data to use for training. Raw counts of features in regions.
TYPE:
|
neighbourhood |
Neighbourhood to use for training. It has to be initialized with the same data as the data argument.
TYPE:
|
negative_sample_k_distance |
How many neighbours away to sample negative regions. For example, if k=2, then the negative regions will be sampled from regions that are at least 3 hops away from the anchor region. Has to be >= 2.
TYPE:
|
RAISES | DESCRIPTION |
---|---|
ValueError
|
If negative_sample_k_distance < 2. |
Source code in srai/embedders/hex2vec/neighbour_dataset.py
¶
Return the number of anchor-positive pairs available in the dataset.
RETURNS | DESCRIPTION |
---|---|
int
|
The number of pairs.
TYPE:
|
¶
Return a single dataset item (anchor, positive, negative).
PARAMETER | DESCRIPTION |
---|---|
data_row_index |
The index of the dataset item to return.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
NeighbourDatasetItem
|
The dataset item. This includes the anchor region, positive region and arandomly sampled negative region.
TYPE:
|
Source code in srai/embedders/hex2vec/neighbour_dataset.py
¶
Bases: NamedTuple
Neighbour dataset item.
ATTRIBUTE | DESCRIPTION |
---|---|
X_anchor |
Anchor regions.
TYPE:
|
X_positive |
Positive regions. Data for the regions that are neighbours of regions in X_anchor.
TYPE:
|
X_negative |
Negative regions. Data for the regions that are NOT neighbours of the regions in X_anchor.
TYPE:
|