Hex2vec embedder
In [1]:
Copied!
from srai.embedders import Hex2VecEmbedder
from srai.joiners import IntersectionJoiner
from srai.loaders import OSMOnlineLoader
from srai.neighbourhoods import H3Neighbourhood
from srai.regionalizers import H3Regionalizer, geocode_to_region_gdf
from srai.plotting import plot_regions, plot_numeric_data
from pytorch_lightning import seed_everything
from srai.embedders import Hex2VecEmbedder
from srai.joiners import IntersectionJoiner
from srai.loaders import OSMOnlineLoader
from srai.neighbourhoods import H3Neighbourhood
from srai.regionalizers import H3Regionalizer, geocode_to_region_gdf
from srai.plotting import plot_regions, plot_numeric_data
from pytorch_lightning import seed_everything
In [2]:
Copied!
SEED = 71
seed_everything(SEED)
SEED = 71
seed_everything(SEED)
Seed set to 71
Out[2]:
71
Load data from OSM¶
First use geocoding to get the area
In [3]:
Copied!
area_gdf = geocode_to_region_gdf("Wrocław, Poland")
plot_regions(area_gdf, tiles_style="CartoDB positron")
area_gdf = geocode_to_region_gdf("Wrocław, Poland")
plot_regions(area_gdf, tiles_style="CartoDB positron")
Out[3]:
Make this Notebook Trusted to load map: File -> Trust Notebook
Next, download the data for the selected region and the specified tags. We're using OSMOnlineLoader
here, as it's faster for low numbers of tags. In a real life scenario with more tags, you would likely want to use the OSMPbfLoader
.
In [4]:
Copied!
tags = {
"leisure": "park",
"landuse": "forest",
"amenity": ["bar", "restaurant", "cafe"],
"water": "river",
"sport": "soccer",
}
loader = OSMOnlineLoader()
features_gdf = loader.load(area_gdf, tags)
folium_map = plot_regions(area_gdf, colormap=["rgba(0,0,0,0)"], tiles_style="CartoDB positron")
features_gdf.explore(m=folium_map)
tags = {
"leisure": "park",
"landuse": "forest",
"amenity": ["bar", "restaurant", "cafe"],
"water": "river",
"sport": "soccer",
}
loader = OSMOnlineLoader()
features_gdf = loader.load(area_gdf, tags)
folium_map = plot_regions(area_gdf, colormap=["rgba(0,0,0,0)"], tiles_style="CartoDB positron")
features_gdf.explore(m=folium_map)
Downloading sport: soccer : 100%|██████████| 7/7 [00:02<00:00, 3.00it/s]
Out[4]:
Make this Notebook Trusted to load map: File -> Trust Notebook
Prepare the data for embedding¶
After downloading the data, we need to prepare it for embedding. Namely - we need to regionalize the selected area, and join the features with regions.
In [5]:
Copied!
regionalizer = H3Regionalizer(resolution=9)
regions_gdf = regionalizer.transform(area_gdf)
plot_regions(regions_gdf, tiles_style="CartoDB positron")
regionalizer = H3Regionalizer(resolution=9)
regions_gdf = regionalizer.transform(area_gdf)
plot_regions(regions_gdf, tiles_style="CartoDB positron")
Out[5]:
Make this Notebook Trusted to load map: File -> Trust Notebook
In [6]:
Copied!
joiner = IntersectionJoiner()
joint_gdf = joiner.transform(regions_gdf, features_gdf)
joint_gdf
joiner = IntersectionJoiner()
joint_gdf = joiner.transform(regions_gdf, features_gdf)
joint_gdf
Out[6]:
region_id | feature_id |
---|---|
891e204756fffff | way/342560157 |
891e20471d7ffff | way/342560157 |
891e20471d3ffff | way/342560157 |
891e204716fffff | relation/3674824 |
891e204717bffff | relation/3674824 |
... | ... |
891e2042a6fffff | way/1044720759 |
node/1984033864 | |
way/310477092 | |
way/1224586685 | |
891e20406c7ffff | way/313424655 |
3949 rows × 0 columns
Embedding¶
After preparing the data we can proceed with generating embeddings for the regions.
In [7]:
Copied!
import warnings
neighbourhood = H3Neighbourhood(regions_gdf)
embedder = Hex2VecEmbedder([15, 10])
with warnings.catch_warnings():
warnings.simplefilter("ignore")
embeddings = embedder.fit_transform(
regions_gdf,
features_gdf,
joint_gdf,
neighbourhood,
trainer_kwargs={"max_epochs": 5, "accelerator": "cpu"},
batch_size=100,
)
embeddings
import warnings
neighbourhood = H3Neighbourhood(regions_gdf)
embedder = Hex2VecEmbedder([15, 10])
with warnings.catch_warnings():
warnings.simplefilter("ignore")
embeddings = embedder.fit_transform(
regions_gdf,
features_gdf,
joint_gdf,
neighbourhood,
trainer_kwargs={"max_epochs": 5, "accelerator": "cpu"},
batch_size=100,
)
embeddings
100%|██████████| 3168/3168 [00:00<00:00, 31264.61it/s] GPU available: False, used: False TPU available: False, using: 0 TPU cores IPU available: False, using: 0 IPUs HPU available: False, using: 0 HPUs | Name | Type | Params --------------------------------------- 0 | encoder | Sequential | 280 --------------------------------------- 280 Trainable params 0 Non-trainable params 280 Total params 0.001 Total estimated model params size (MB)
`Trainer.fit` stopped: `max_epochs=5` reached.
Out[7]:
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | |
---|---|---|---|---|---|---|---|---|---|---|
region_id | ||||||||||
891e2040173ffff | 0.343880 | -0.223290 | -0.012038 | 0.266015 | 0.214774 | -0.123038 | -0.256550 | -0.107333 | -0.217049 | 0.302045 |
891e204756fffff | 0.096392 | 0.220119 | 0.465636 | 0.581243 | -0.187880 | -0.037154 | 0.366241 | -0.418458 | 0.386511 | 0.119130 |
891e204716fffff | -0.330573 | 0.020956 | -0.333520 | -0.457863 | 0.017288 | 0.245358 | -0.368664 | 0.331725 | -0.074456 | -0.032008 |
891e20414b3ffff | -0.330574 | 0.020956 | -0.333520 | -0.457863 | 0.017288 | 0.245358 | -0.368664 | 0.331725 | -0.074456 | -0.032008 |
891e20442d3ffff | -0.464493 | 0.138147 | -0.393941 | -0.574460 | -0.067107 | 0.280365 | -0.315895 | 0.398935 | -0.085970 | -0.146017 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
891e20406c7ffff | -0.596529 | 0.255250 | -0.453033 | -0.690959 | -0.152528 | 0.314501 | -0.261384 | 0.465135 | -0.097409 | -0.261858 |
891e2040d9bffff | 0.343880 | -0.223290 | -0.012038 | 0.266015 | 0.214774 | -0.123038 | -0.256550 | -0.107333 | -0.217049 | 0.302045 |
891e204e1d3ffff | 0.343880 | -0.223290 | -0.012038 | 0.266015 | 0.214774 | -0.123038 | -0.256550 | -0.107333 | -0.217049 | 0.302045 |
891e205a9a7ffff | 0.343880 | -0.223290 | -0.012038 | 0.266015 | 0.214774 | -0.123038 | -0.256550 | -0.107333 | -0.217049 | 0.302045 |
891e2042e7bffff | -0.207339 | -0.090883 | -0.147170 | -0.687353 | -0.153512 | 0.164471 | 0.044825 | 0.309862 | -0.101235 | -0.483526 |
3168 rows × 10 columns
Visualizing the embeddings' similarity¶
In [8]:
Copied!
from sklearn.cluster import KMeans
clusterizer = KMeans(n_clusters=5, random_state=SEED)
clusterizer.fit(embeddings)
embeddings["cluster"] = clusterizer.labels_
embeddings
from sklearn.cluster import KMeans
clusterizer = KMeans(n_clusters=5, random_state=SEED)
clusterizer.fit(embeddings)
embeddings["cluster"] = clusterizer.labels_
embeddings
/opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/site-packages/sklearn/cluster/_kmeans.py:1412: FutureWarning: The default value of `n_init` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning super()._check_params_vs_input(X, default_n_init=10)
Out[8]:
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | cluster | |
---|---|---|---|---|---|---|---|---|---|---|---|
region_id | |||||||||||
891e2040173ffff | 0.343880 | -0.223290 | -0.012038 | 0.266015 | 0.214774 | -0.123038 | -0.256550 | -0.107333 | -0.217049 | 0.302045 | 1 |
891e204756fffff | 0.096392 | 0.220119 | 0.465636 | 0.581243 | -0.187880 | -0.037154 | 0.366241 | -0.418458 | 0.386511 | 0.119130 | 3 |
891e204716fffff | -0.330573 | 0.020956 | -0.333520 | -0.457863 | 0.017288 | 0.245358 | -0.368664 | 0.331725 | -0.074456 | -0.032008 | 2 |
891e20414b3ffff | -0.330574 | 0.020956 | -0.333520 | -0.457863 | 0.017288 | 0.245358 | -0.368664 | 0.331725 | -0.074456 | -0.032008 | 2 |
891e20442d3ffff | -0.464493 | 0.138147 | -0.393941 | -0.574460 | -0.067107 | 0.280365 | -0.315895 | 0.398935 | -0.085970 | -0.146017 | 2 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
891e20406c7ffff | -0.596529 | 0.255250 | -0.453033 | -0.690959 | -0.152528 | 0.314501 | -0.261384 | 0.465135 | -0.097409 | -0.261858 | 2 |
891e2040d9bffff | 0.343880 | -0.223290 | -0.012038 | 0.266015 | 0.214774 | -0.123038 | -0.256550 | -0.107333 | -0.217049 | 0.302045 | 1 |
891e204e1d3ffff | 0.343880 | -0.223290 | -0.012038 | 0.266015 | 0.214774 | -0.123038 | -0.256550 | -0.107333 | -0.217049 | 0.302045 | 1 |
891e205a9a7ffff | 0.343880 | -0.223290 | -0.012038 | 0.266015 | 0.214774 | -0.123038 | -0.256550 | -0.107333 | -0.217049 | 0.302045 | 1 |
891e2042e7bffff | -0.207339 | -0.090883 | -0.147170 | -0.687353 | -0.153512 | 0.164471 | 0.044825 | 0.309862 | -0.101235 | -0.483526 | 2 |
3168 rows × 11 columns
In [9]:
Copied!
plot_numeric_data(regions_gdf, "cluster", embeddings)
plot_numeric_data(regions_gdf, "cluster", embeddings)
Out[9]:
Make this Notebook Trusted to load map: File -> Trust Notebook