Neighbour dataset
NeighbourDataset.
This dataset is used to train a model to predict whether regions are neighbours or not. As defined in Hex2Vec paper[1].
References
¶
Bases: NamedTuple
Neighbour dataset item.
ATTRIBUTE | DESCRIPTION |
---|---|
X_anchor |
Anchor regions.
TYPE:
|
X_positive |
Positive regions. Data for the regions that are neighbours of regions in X_anchor.
TYPE:
|
X_negative |
Negative regions. Data for the regions that are NOT neighbours of the regions in X_anchor.
TYPE:
|
¶
Bases: Dataset[NeighbourDatasetItem]
, Generic[T]
Dataset for training a model to predict neighbours.
It works by returning triplets of regions: anchor, positive and negative. A model can be trained to predict that the anchor region is a neighbour of the positive region, and that it is not a neighbour of the negative region.
PARAMETER | DESCRIPTION |
---|---|
data |
Data to use for training. Raw counts of features in regions.
TYPE:
|
neighbourhood |
Neighbourhood to use for training. It has to be initialized with the same data as the data argument.
TYPE:
|
negative_sample_k_distance |
How many neighbours away to sample negative regions. For example, if k=2, then the negative regions will be sampled from regions that are at least 3 hops away from the anchor region. Has to be >= 2.
TYPE:
|
RAISES | DESCRIPTION |
---|---|
ValueError
|
If negative_sample_k_distance < 2. |
Source code in srai/embedders/hex2vec/neighbour_dataset.py
¶
Return the number of anchor-positive pairs available in the dataset.
RETURNS | DESCRIPTION |
---|---|
int
|
The number of pairs.
TYPE:
|
¶
Return a single dataset item (anchor, positive, negative).
PARAMETER | DESCRIPTION |
---|---|
data_row_index |
The index of the dataset item to return.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
NeighbourDatasetItem
|
The dataset item. This includes the anchor region, positive region and arandomly sampled negative region.
TYPE:
|