Contextual count embedder
from srai.loaders.osm_loaders import OSMPbfLoader
from srai.regionalizers import H3Regionalizer
from srai.joiners import IntersectionJoiner
from srai.embedders import ContextualCountEmbedder
from srai.plotting.folium_wrapper import plot_regions, plot_numeric_data
from srai.neighbourhoods import H3Neighbourhood
Data preparation¶
In order to use ContextualCountEmbedder
we need to prepare some data.
Namely we need: regions_gdf
, features_gdf
, and joint_gdf
.
These are the outputs of Regionalizers, Loaders and Joiners respectively.
from srai.utils import geocode_to_region_gdf
area_gdf = geocode_to_region_gdf("Lisboa, PT")
plot_regions(area_gdf)
Regionalize the area using an H3Regionalizer¶
regionalizer = H3Regionalizer(resolution=9, buffer=True)
regions_gdf = regionalizer.transform(area_gdf)
regions_gdf
geometry | |
---|---|
region_id | |
89393375edbffff | POLYGON ((-9.17724 38.77393, -9.17492 38.77340... |
89393362a13ffff | POLYGON ((-9.18800 38.70896, -9.18568 38.70843... |
89393362c47ffff | POLYGON ((-9.20019 38.73939, -9.19787 38.73886... |
8939336058bffff | POLYGON ((-9.21678 38.70513, -9.21507 38.70647... |
89393362843ffff | POLYGON ((-9.18365 38.72180, -9.18133 38.72127... |
... | ... |
89393362b6fffff | POLYGON ((-9.14919 38.70186, -9.14687 38.70133... |
89393375bdbffff | POLYGON ((-9.12741 38.76604, -9.12912 38.76470... |
89393362867ffff | POLYGON ((-9.16756 38.72506, -9.16926 38.72371... |
893933674cbffff | POLYGON ((-9.13087 38.73744, -9.12917 38.73878... |
89393360523ffff | POLYGON ((-9.19992 38.69957, -9.19821 38.70091... |
830 rows × 1 columns
Download some objects from OpenStreetMap¶
You can use both osm_tags_type
and grouped_osm_tags_type
filters. In this example, a predefined grouped_osm_tags_type
filter BASE_OSM_GROUPS_FILTER
is used.
from srai.loaders.osm_loaders.filters import BASE_OSM_GROUPS_FILTER
loader = OSMPbfLoader()
features_gdf = loader.load(area_gdf, tags=BASE_OSM_GROUPS_FILTER)
features_gdf
[Lisbon, Portugal] Downloading pbf file #1 (Elements): 100%|██████████| 1280333/1280333 [00:02<00:00, 505225.06it/s] ccfc4ec912ac803c97b939feba28e0a57de61e0e543e36e51c010f9f0a167e37.osm.pbf: 100%|██████████| 7.20M/7.20M [00:00<00:00, 71.3MiB/s] [Lisbon, Portugal] Counting pbf features: 763325it [00:02, 285243.92it/s] [Lisbon, Portugal] Parsing pbf file #1: 100%|██████████| 763325/763325 [00:21<00:00, 36188.81it/s] Grouping features: 100%|██████████| 23534/23534 [00:07<00:00, 2991.00it/s]
geometry | aerialway | airports | sustenance | education | transportation | finances | healthcare | culture_art_entertainment | other | buildings | emergency | historic | leisure | shops | sport | tourism | greenery | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
feature_id | ||||||||||||||||||
node/21433772 | POINT (-9.19059 38.72880) | NaN | NaN | NaN | NaN | public_transport=stop_position | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
node/21433776 | POINT (-9.19376 38.72666) | NaN | NaN | NaN | NaN | public_transport=stop_position | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
node/25414208 | POINT (-9.16568 38.74047) | NaN | NaN | NaN | NaN | public_transport=stop_position | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
node/25414256 | POINT (-9.10320 38.74623) | NaN | NaN | NaN | NaN | public_transport=stop_position | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
node/25414265 | POINT (-9.10243 38.74785) | NaN | NaN | NaN | NaN | public_transport=stop_position | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
way/1182799455 | MULTIPOLYGON (((-9.20836 38.69626, -9.20828 38... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | leisure=garden | NaN | NaN | NaN | NaN |
way/1183879672 | MULTIPOLYGON (((-9.13064 38.74143, -9.13056 38... | NaN | NaN | NaN | NaN | amenity=motorcycle_parking | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
way/1183879673 | MULTIPOLYGON (((-9.13106 38.74166, -9.13104 38... | NaN | NaN | NaN | NaN | amenity=motorcycle_parking | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
relation/15857588 | MULTIPOLYGON (((-9.16941 38.74181, -9.16941 38... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | landuse=grass |
relation/12867796 | MULTIPOLYGON (((-9.15580 38.75870, -9.15578 38... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | leisure=park | NaN | NaN | NaN | leisure=park |
23534 rows × 18 columns
Join the objects with the regions they belong to¶
joiner = IntersectionJoiner()
joint_gdf = joiner.transform(regions_gdf, features_gdf)
joint_gdf
region_id | feature_id |
---|---|
89393375edbffff | way/501125559 |
893933753abffff | way/501125559 |
8939337533bffff | way/501125559 |
89393375337ffff | way/501125559 |
89393375307ffff | way/501125559 |
... | ... |
89393360523ffff | node/7967814481 |
node/7967814480 | |
node/8685265790 | |
node/8685265791 | |
way/1148579365 |
26358 rows × 0 columns
Embed using features existing in data¶
ContextualCountEmbedder
extends capabilities of basic CountEmbedder
by incorporating the neighbourhood of embedded region. In this example we will use the H3Neighbourhood
.
h3n = H3Neighbourhood()
Squashed vector version (default)¶
Embedder will return vector of the same length as CountEmbedder
, but will sum averaged values from the neighbourhoods diminished by the neighbour distance squared.
cce = ContextualCountEmbedder(
neighbourhood=h3n, neighbourhood_distance=10, concatenate_vectors=False
)
embeddings = cce.transform(regions_gdf, features_gdf, joint_gdf)
embeddings
Generating embeddings: 100%|██████████| 830/830 [00:04<00:00, 181.01it/s]
aerialway | airports | sustenance | education | transportation | finances | healthcare | culture_art_entertainment | other | buildings | emergency | historic | leisure | shops | sport | tourism | greenery | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
region_id | |||||||||||||||||
89393375edbffff | 0.000517 | 0.006110 | 0.506122 | 1.267620 | 4.872905 | 0.112879 | 0.156736 | 0.017523 | 0.163566 | 0.346716 | 0.005265 | 0.598560 | 9.285256 | 0.560436 | 5.330924 | 0.295255 | 3.868214 |
89393362a13ffff | 0.000000 | 0.000000 | 1.871256 | 0.414449 | 3.146475 | 0.089227 | 0.180573 | 0.106206 | 0.202513 | 0.270041 | 0.000000 | 3.310236 | 6.709864 | 0.751537 | 0.594018 | 2.631834 | 4.061224 |
89393362c47ffff | 0.003079 | 0.000000 | 0.529584 | 2.380833 | 3.448962 | 0.128346 | 0.166346 | 0.021761 | 0.165200 | 0.291367 | 0.000000 | 1.448721 | 3.022629 | 0.732349 | 0.508401 | 3.829190 | 3.621499 |
8939336058bffff | 0.000000 | 0.000000 | 0.465067 | 1.403370 | 7.517218 | 0.066004 | 1.196279 | 0.074523 | 0.170623 | 5.044112 | 0.000000 | 0.194903 | 18.891414 | 0.319483 | 1.716718 | 2.917584 | 1.688575 |
89393362843ffff | 0.001641 | 0.000000 | 0.482651 | 0.125161 | 3.841818 | 0.079249 | 0.102342 | 0.057111 | 1.146788 | 0.177878 | 0.000000 | 0.194817 | 2.572026 | 0.555815 | 0.474582 | 1.058103 | 1.109212 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
89393362b6fffff | 0.000000 | 0.000000 | 11.965530 | 0.360005 | 10.966990 | 1.098217 | 0.644391 | 0.618456 | 0.748023 | 1.697312 | 0.000000 | 1.704065 | 1.167557 | 5.902948 | 0.102689 | 3.252516 | 1.527671 |
89393375bdbffff | 0.001628 | 0.170667 | 0.367418 | 0.266847 | 7.342994 | 0.099945 | 0.201005 | 0.065712 | 0.168078 | 0.222231 | 0.002132 | 0.087490 | 4.418927 | 0.724823 | 1.394553 | 0.255990 | 13.776310 |
89393362867ffff | 0.001641 | 0.000000 | 5.240758 | 0.492133 | 29.478118 | 0.739719 | 1.669537 | 0.127623 | 2.527958 | 1.297166 | 0.000176 | 3.939958 | 5.359292 | 10.325424 | 0.388158 | 4.095598 | 7.511143 |
893933674cbffff | 0.000184 | 0.002810 | 5.940189 | 0.585222 | 14.506640 | 0.885639 | 2.799263 | 0.083080 | 2.407836 | 0.521798 | 0.000268 | 0.425256 | 5.009469 | 5.106121 | 0.642266 | 1.520991 | 5.838581 |
89393360523ffff | 0.000000 | 0.000000 | 3.014166 | 0.545753 | 12.745386 | 0.322565 | 2.363125 | 0.089135 | 1.642907 | 0.450260 | 0.000000 | 2.557576 | 13.735630 | 5.762594 | 3.717631 | 1.307303 | 9.454961 |
830 rows × 17 columns
Concatenated vector version¶
Embedder will return vector of length n * distance
where n
is number of features from the CountEmbedder
and distance
is number of neighbourhoods analysed.
Each feature will be postfixed with _n
string, where n
is the current distance. Values are averaged from all neighbours.
wide_cce = ContextualCountEmbedder(
neighbourhood=h3n, neighbourhood_distance=10, concatenate_vectors=True
)
wide_embeddings = wide_cce.transform(regions_gdf, features_gdf, joint_gdf)
wide_embeddings
Generating embeddings: 100%|██████████| 830/830 [00:04<00:00, 184.44it/s]
aerialway_0 | airports_0 | sustenance_0 | education_0 | transportation_0 | finances_0 | healthcare_0 | culture_art_entertainment_0 | other_0 | buildings_0 | ... | culture_art_entertainment_10 | other_10 | buildings_10 | emergency_10 | historic_10 | leisure_10 | shops_10 | sport_10 | tourism_10 | greenery_10 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
region_id | |||||||||||||||||||||
89393375edbffff | 0.0 | 0.0 | 0.0 | 1.0 | 2.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.156250 | 0.375000 | 0.531250 | 0.000000 | 0.343750 | 2.281250 | 3.250000 | 0.468750 | 1.718750 | 3.187500 |
89393362a13ffff | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.424242 | 1.151515 | 2.818182 | 0.000000 | 1.727273 | 4.181818 | 4.848485 | 0.787879 | 3.636364 | 2.666667 |
89393362c47ffff | 0.0 | 0.0 | 0.0 | 2.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.324324 | 0.432432 | 0.729730 | 0.000000 | 0.486486 | 4.135135 | 2.729730 | 1.486486 | 1.081081 | 6.486486 |
8939336058bffff | 0.0 | 0.0 | 0.0 | 1.0 | 5.0 | 0.0 | 1.0 | 0.0 | 0.0 | 4.0 | ... | 0.058824 | 0.352941 | 0.352941 | 0.000000 | 0.470588 | 4.235294 | 1.705882 | 1.882353 | 1.470588 | 2.176471 |
89393362843ffff | 0.0 | 0.0 | 0.0 | 0.0 | 2.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | ... | 0.400000 | 1.050000 | 1.850000 | 0.000000 | 1.650000 | 5.050000 | 5.050000 | 1.175000 | 3.225000 | 4.125000 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
89393362b6fffff | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.357143 | 0.785714 | 2.357143 | 0.000000 | 1.071429 | 3.035714 | 10.750000 | 0.785714 | 2.750000 | 3.357143 |
89393375bdbffff | 0.0 | 0.0 | 0.0 | 0.0 | 2.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.086957 | 0.326087 | 1.086957 | 0.000000 | 0.304348 | 3.673913 | 6.717391 | 1.130435 | 1.239130 | 4.326087 |
89393362867ffff | 0.0 | 0.0 | 2.0 | 0.0 | 24.0 | 0.0 | 1.0 | 0.0 | 2.0 | 0.0 | ... | 0.234043 | 0.936170 | 1.106383 | 0.021277 | 1.276596 | 4.276596 | 5.914894 | 1.170213 | 3.127660 | 4.744681 |
893933674cbffff | 0.0 | 0.0 | 2.0 | 0.0 | 5.0 | 0.0 | 2.0 | 0.0 | 2.0 | 0.0 | ... | 0.266667 | 1.022222 | 1.933333 | 0.000000 | 2.288889 | 4.511111 | 4.911111 | 1.755556 | 3.466667 | 5.711111 |
89393360523ffff | 0.0 | 0.0 | 1.0 | 0.0 | 8.0 | 0.0 | 2.0 | 0.0 | 1.0 | 0.0 | ... | 0.105263 | 1.105263 | 0.789474 | 0.000000 | 0.894737 | 3.842105 | 0.368421 | 0.631579 | 1.473684 | 1.315789 |
830 rows × 187 columns
Plotting example features¶
plot_numeric_data(regions_gdf, embeddings, "leisure", tiles_style="CartoDB positron")
plot_numeric_data(regions_gdf, embeddings, "transportation", tiles_style="CartoDB positron")