Contextual count embedder
from srai.loaders.osm_loaders import OSMPbfLoader
from srai.regionalizers import H3Regionalizer
from srai.joiners import IntersectionJoiner
from srai.embedders import ContextualCountEmbedder
from srai.plotting.folium_wrapper import plot_regions, plot_numeric_data
from srai.neighbourhoods import H3Neighbourhood
Data preparation¶
In order to use ContextualCountEmbedder
we need to prepare some data.
Namely we need: regions_gdf
, features_gdf
, and joint_gdf
.
These are the outputs of Regionalizers, Loaders and Joiners respectively.
from srai.utils import geocode_to_region_gdf
area_gdf = geocode_to_region_gdf("Lisboa, PT")
plot_regions(area_gdf)
Regionalize the area using an H3Regionalizer¶
regionalizer = H3Regionalizer(resolution=9, buffer=True)
regions_gdf = regionalizer.transform(area_gdf)
regions_gdf
geometry | |
---|---|
region_id | |
89393375b8bffff | POLYGON ((-9.11998 38.76953, -9.11936 38.76766... |
893933674afffff | POLYGON ((-9.11089 38.74845, -9.11260 38.74711... |
8939336669bffff | POLYGON ((-9.09071 38.77841, -9.09303 38.77894... |
89393362c07ffff | POLYGON ((-9.19399 38.74662, -9.19167 38.74610... |
89393362b4bffff | POLYGON ((-9.16480 38.70369, -9.16419 38.70182... |
... | ... |
893933674bbffff | POLYGON ((-9.11894 38.74682, -9.11662 38.74630... |
89393375e03ffff | POLYGON ((-9.16222 38.77397, -9.15990 38.77344... |
89393362dbbffff | POLYGON ((-9.16657 38.76113, -9.16828 38.75979... |
89393362a23ffff | POLYGON ((-9.17408 38.70580, -9.17176 38.70527... |
89393375c5bffff | POLYGON ((-9.16545 38.79026, -9.16483 38.78839... |
830 rows × 1 columns
Download some objects from OpenStreetMap¶
You can use both osm_tags_type
and grouped_osm_tags_type
filters. In this example, a predefined grouped_osm_tags_type
filter BASE_OSM_GROUPS_FILTER
is used.
from srai.loaders.osm_loaders.filters import BASE_OSM_GROUPS_FILTER
loader = OSMPbfLoader()
features_gdf = loader.load(area_gdf, tags=BASE_OSM_GROUPS_FILTER)
features_gdf
[Lisbon, Portugal] Downloading pbf file #1 (Elements): 100%|██████████| 1281730/1281730 [00:03<00:00, 409064.90it/s] ccfc4ec912ac803c97b939feba28e0a57de61e0e543e36e51c010f9f0a167e37.osm.pbf: 100%|██████████| 7.22M/7.22M [00:00<00:00, 30.6MiB/s] [Lisbon, Portugal] Counting pbf features: 764106it [00:02, 280209.38it/s] [Lisbon, Portugal] Parsing pbf file #1: 100%|██████████| 764106/764106 [00:19<00:00, 39987.61it/s] Grouping features: 100%|██████████| 23585/23585 [00:07<00:00, 3174.76it/s]
geometry | aerialway | airports | sustenance | education | transportation | finances | healthcare | culture_art_entertainment | other | buildings | emergency | historic | leisure | shops | sport | tourism | greenery | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
feature_id | ||||||||||||||||||
node/21433772 | POINT (-9.19059 38.72880) | NaN | NaN | NaN | NaN | public_transport=stop_position | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
node/21433776 | POINT (-9.19376 38.72666) | NaN | NaN | NaN | NaN | public_transport=stop_position | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
node/25414208 | POINT (-9.16568 38.74047) | NaN | NaN | NaN | NaN | public_transport=stop_position | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
node/25414256 | POINT (-9.10320 38.74623) | NaN | NaN | NaN | NaN | public_transport=stop_position | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
node/25414265 | POINT (-9.10243 38.74785) | NaN | NaN | NaN | NaN | public_transport=stop_position | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
way/1183879673 | MULTIPOLYGON (((-9.13106 38.74166, -9.13104 38... | NaN | NaN | NaN | NaN | amenity=motorcycle_parking | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
way/1185406284 | MULTIPOLYGON (((-9.13921 38.71254, -9.13919 38... | NaN | NaN | amenity=restaurant | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
way/1187260469 | MULTIPOLYGON (((-9.14695 38.77388, -9.14679 38... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | leisure=park | NaN | NaN | NaN | leisure=park |
relation/15857588 | MULTIPOLYGON (((-9.16941 38.74181, -9.16941 38... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | landuse=grass |
relation/12867796 | MULTIPOLYGON (((-9.15580 38.75870, -9.15578 38... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | leisure=park | NaN | NaN | NaN | leisure=park |
23585 rows × 18 columns
Join the objects with the regions they belong to¶
joiner = IntersectionJoiner()
joint_gdf = joiner.transform(regions_gdf, features_gdf)
joint_gdf
region_id | feature_id |
---|---|
89393375b8bffff | way/153605598 |
89393375b13ffff | way/153605598 |
89393375b8fffff | way/153605598 |
89393375bc7ffff | way/153605598 |
89393375b8bffff | way/337051118 |
... | ... |
89393362a23ffff | node/9931268624 |
node/257290320 | |
node/4293350006 | |
way/966853629 | |
way/966853628 |
26410 rows × 0 columns
Embed using features existing in data¶
ContextualCountEmbedder
extends capabilities of basic CountEmbedder
by incorporating the neighbourhood of embedded region. In this example we will use the H3Neighbourhood
.
h3n = H3Neighbourhood()
Squashed vector version (default)¶
Embedder will return vector of the same length as CountEmbedder
, but will sum averaged values from the neighbourhoods diminished by the neighbour distance squared.
cce = ContextualCountEmbedder(
neighbourhood=h3n, neighbourhood_distance=10, concatenate_vectors=False
)
embeddings = cce.transform(regions_gdf, features_gdf, joint_gdf)
embeddings
Generating embeddings: 100%|██████████| 830/830 [00:03<00:00, 211.22it/s]
aerialway | airports | sustenance | education | transportation | finances | healthcare | culture_art_entertainment | other | buildings | emergency | historic | leisure | shops | sport | tourism | greenery | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
region_id | |||||||||||||||||
89393375b8bffff | 0.003705 | 0.052774 | 0.333977 | 0.300545 | 11.337748 | 0.101101 | 0.378020 | 0.024072 | 0.320187 | 0.225556 | 0.006119 | 0.097582 | 18.504241 | 0.875088 | 5.707852 | 1.220070 | 21.942691 |
893933674afffff | 0.002505 | 0.004717 | 0.439397 | 2.421084 | 14.587863 | 0.070648 | 0.303474 | 0.039515 | 1.126914 | 1.364482 | 0.002931 | 0.072529 | 4.950668 | 0.590286 | 0.665959 | 0.174296 | 9.025216 |
8939336669bffff | 0.125936 | 0.007257 | 9.071953 | 3.400959 | 11.698153 | 1.279454 | 1.399608 | 0.034032 | 1.097324 | 0.567189 | 0.135554 | 0.088424 | 9.165266 | 8.265133 | 1.362896 | 4.767990 | 7.116095 |
89393362c07ffff | 0.004075 | 0.000000 | 5.488408 | 4.802693 | 18.906114 | 4.494248 | 0.438514 | 0.074183 | 0.268173 | 0.497544 | 0.000721 | 0.200010 | 6.331211 | 24.652327 | 2.772531 | 0.728489 | 2.695897 |
89393362b4bffff | 0.000000 | 0.000000 | 7.433913 | 0.326376 | 21.071278 | 0.391362 | 0.314791 | 1.318935 | 2.598337 | 9.610583 | 0.000000 | 5.176153 | 8.413458 | 1.852803 | 0.213532 | 4.561278 | 2.352847 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
893933674bbffff | 0.001166 | 0.008133 | 0.437449 | 0.571631 | 24.069191 | 0.094435 | 1.236929 | 0.130727 | 0.160772 | 0.289877 | 0.001406 | 0.062576 | 1.560361 | 2.679245 | 0.486474 | 0.229522 | 11.902632 |
89393375e03ffff | 0.001435 | 0.030241 | 6.009123 | 3.601497 | 33.927844 | 0.426508 | 1.614956 | 0.068912 | 0.214530 | 2.384254 | 0.001589 | 0.460167 | 4.671317 | 14.627670 | 0.464788 | 0.537932 | 24.733279 |
89393362dbbffff | 0.005370 | 0.008355 | 3.382429 | 3.711934 | 18.298789 | 1.331939 | 0.476215 | 0.026609 | 3.192640 | 0.608176 | 0.006944 | 1.183956 | 9.721074 | 3.432384 | 2.536626 | 3.389757 | 11.762104 |
89393362a23ffff | 0.000000 | 0.000000 | 14.740303 | 1.345512 | 46.997554 | 4.299375 | 2.497101 | 0.117040 | 1.584218 | 6.065698 | 0.000000 | 3.898279 | 7.592059 | 22.862262 | 0.344471 | 5.100315 | 13.300876 |
89393375c5bffff | 0.000000 | 0.023091 | 0.133048 | 0.128782 | 3.327121 | 0.055845 | 0.091796 | 0.008943 | 0.166479 | 0.078493 | 0.000650 | 0.114407 | 1.363263 | 0.245880 | 0.486365 | 0.102227 | 3.042637 |
830 rows × 17 columns
Concatenated vector version¶
Embedder will return vector of length n * distance
where n
is number of features from the CountEmbedder
and distance
is number of neighbourhoods analysed.
Each feature will be postfixed with _n
string, where n
is the current distance. Values are averaged from all neighbours.
wide_cce = ContextualCountEmbedder(
neighbourhood=h3n, neighbourhood_distance=10, concatenate_vectors=True
)
wide_embeddings = wide_cce.transform(regions_gdf, features_gdf, joint_gdf)
wide_embeddings
Generating embeddings: 100%|██████████| 830/830 [00:03<00:00, 212.97it/s]
aerialway_0 | airports_0 | sustenance_0 | education_0 | transportation_0 | finances_0 | healthcare_0 | culture_art_entertainment_0 | other_0 | buildings_0 | ... | culture_art_entertainment_10 | other_10 | buildings_10 | emergency_10 | historic_10 | leisure_10 | shops_10 | sport_10 | tourism_10 | greenery_10 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
region_id | |||||||||||||||||||||
89393375b8bffff | 0.0 | 0.0 | 0.0 | 0.0 | 4.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.142857 | 0.400000 | 0.714286 | 0.000000 | 0.571429 | 4.114286 | 2.657143 | 1.114286 | 1.514286 | 3.571429 |
893933674afffff | 0.0 | 0.0 | 0.0 | 2.0 | 9.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | ... | 0.323529 | 0.941176 | 2.294118 | 0.058824 | 0.470588 | 3.411765 | 6.588235 | 0.441176 | 2.176471 | 3.029412 |
8939336669bffff | 0.0 | 0.0 | 7.0 | 3.0 | 6.0 | 1.0 | 1.0 | 0.0 | 1.0 | 0.0 | ... | 0.000000 | 0.111111 | 0.277778 | 0.000000 | 0.000000 | 3.166667 | 0.722222 | 0.555556 | 0.333333 | 4.555556 |
89393362c07ffff | 0.0 | 0.0 | 4.0 | 4.0 | 12.0 | 4.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.147059 | 0.529412 | 1.058824 | 0.000000 | 1.117647 | 4.000000 | 3.441176 | 1.147059 | 1.735294 | 4.558824 |
89393362b4bffff | 0.0 | 0.0 | 4.0 | 0.0 | 16.0 | 0.0 | 0.0 | 1.0 | 2.0 | 8.0 | ... | 0.343750 | 1.343750 | 1.718750 | 0.000000 | 1.250000 | 4.437500 | 6.312500 | 1.031250 | 3.406250 | 5.000000 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
893933674bbffff | 0.0 | 0.0 | 0.0 | 0.0 | 19.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | ... | 0.297297 | 0.864865 | 1.756757 | 0.027027 | 1.054054 | 3.459459 | 4.405405 | 0.945946 | 3.378378 | 4.513514 |
89393375e03ffff | 0.0 | 0.0 | 5.0 | 3.0 | 28.0 | 0.0 | 1.0 | 0.0 | 0.0 | 2.0 | ... | 0.233333 | 0.333333 | 0.900000 | 0.000000 | 0.433333 | 4.100000 | 2.633333 | 1.633333 | 2.400000 | 7.133333 |
89393362dbbffff | 0.0 | 0.0 | 2.0 | 3.0 | 12.0 | 1.0 | 0.0 | 0.0 | 3.0 | 0.0 | ... | 0.133333 | 0.466667 | 1.222222 | 0.000000 | 0.444444 | 2.533333 | 4.666667 | 0.444444 | 1.644444 | 3.733333 |
89393362a23ffff | 0.0 | 0.0 | 12.0 | 1.0 | 40.0 | 4.0 | 2.0 | 0.0 | 1.0 | 5.0 | ... | 0.621622 | 0.972973 | 1.891892 | 0.000000 | 1.702703 | 4.810811 | 3.648649 | 0.567568 | 4.783784 | 5.027027 |
89393375c5bffff | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.043478 | 0.260870 | 1.130435 | 0.000000 | 0.173913 | 2.956522 | 1.695652 | 0.521739 | 0.521739 | 5.391304 |
830 rows × 187 columns
Plotting example features¶
plot_numeric_data(regions_gdf, embeddings, "leisure", tiles_style="CartoDB positron")
plot_numeric_data(regions_gdf, embeddings, "transportation", tiles_style="CartoDB positron")