Contextual count embedder
from srai.loaders.osm_loaders import OSMPbfLoader
from srai.regionalizers import H3Regionalizer
from srai.joiners import IntersectionJoiner
from srai.embedders import ContextualCountEmbedder
from srai.plotting.folium_wrapper import plot_regions, plot_numeric_data
from srai.neighbourhoods import H3Neighbourhood
Data preparation¶
In order to use ContextualCountEmbedder
we need to prepare some data.
Namely we need: regions_gdf
, features_gdf
, and joint_gdf
.
These are the outputs of Regionalizers, Loaders and Joiners respectively.
from srai.regionalizers import geocode_to_region_gdf
area_gdf = geocode_to_region_gdf("Lisboa, PT")
plot_regions(area_gdf)
Regionalize the area using an H3Regionalizer¶
regionalizer = H3Regionalizer(resolution=9, buffer=True)
regions_gdf = regionalizer.transform(area_gdf)
regions_gdf
geometry | |
---|---|
region_id | |
89393362887ffff | POLYGON ((-9.17713 38.74106, -9.17883 38.73972... |
893933675d7ffff | POLYGON ((-9.10176 38.75328, -9.10346 38.75194... |
89393362b9bffff | POLYGON ((-9.16462 38.72266, -9.16632 38.72132... |
89393375e7bffff | POLYGON ((-9.16037 38.76836, -9.16208 38.76702... |
89393362b47ffff | POLYGON ((-9.15799 38.70905, -9.15969 38.70771... |
... | ... |
89393375977ffff | POLYGON ((-9.09890 38.78374, -9.10061 38.78240... |
8939337593bffff | POLYGON ((-9.09781 38.78694, -9.09951 38.78560... |
89393362dc7ffff | POLYGON ((-9.17571 38.75630, -9.17742 38.75496... |
8939337582fffff | POLYGON ((-9.11903 38.77968, -9.12073 38.77835... |
89393375eabffff | POLYGON ((-9.15602 38.78120, -9.15772 38.77986... |
830 rows × 1 columns
Download some objects from OpenStreetMap¶
You can use both OsmTagsFilter
and GroupedOsmTagsFilter
filters. In this example, a predefined GroupedOsmTagsFilter
filter BASE_OSM_GROUPS_FILTER
is used.
from srai.loaders.osm_loaders.filters import BASE_OSM_GROUPS_FILTER
loader = OSMPbfLoader()
features_gdf = loader.load(area_gdf, tags=BASE_OSM_GROUPS_FILTER)
features_gdf
[Lisbon, Portugal] Downloading pbf file #1 (Elements): 100%|██████████| 1306991/1306991 [00:03<00:00, 350354.00it/s] ccfc4ec912ac803c97b939feba28e0a57de61e0e543e36e51c010f9f0a167e37.osm.pbf: 100%|██████████| 7.38M/7.38M [00:00<00:00, 9.36MiB/s] [Lisbon, Portugal] Counting pbf features: 777917it [00:03, 226099.45it/s] [Lisbon, Portugal] Parsing pbf file #1: 100%|██████████| 777917/777917 [00:18<00:00, 41664.97it/s] Grouping features: 100%|██████████| 18/18 [00:00<00:00, 25.19it/s]
geometry | aerialway | airports | sustenance | education | transportation | finances | healthcare | culture_art_entertainment | other | buildings | emergency | historic | leisure | shops | sport | tourism | greenery | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
feature_id | ||||||||||||||||||
node/21433772 | POINT (-9.19059 38.72880) | NaN | NaN | NaN | NaN | public_transport=stop_position | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
node/21433776 | POINT (-9.19376 38.72666) | NaN | NaN | NaN | NaN | public_transport=stop_position | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
node/25414208 | POINT (-9.16663 38.74018) | NaN | NaN | NaN | NaN | public_transport=stop_position | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
node/25414256 | POINT (-9.10286 38.74711) | NaN | NaN | NaN | NaN | public_transport=stop_position | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
node/25414265 | POINT (-9.10273 38.74707) | NaN | NaN | NaN | NaN | public_transport=stop_position | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
relation/8131598 | MULTIPOLYGON (((-9.14676 38.74328, -9.14662 38... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | leisure=garden | NaN | NaN | NaN | NaN |
relation/16158578 | MULTIPOLYGON (((-9.15193 38.72702, -9.15123 38... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | landuse=grass |
relation/16201238 | MULTIPOLYGON (((-9.13806 38.74037, -9.13809 38... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | amenity=place_of_worship | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
relation/7117461 | MULTIPOLYGON (((-9.13065 38.74680, -9.13036 38... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | leisure=park | NaN | NaN | NaN | leisure=park |
relation/16527291 | MULTIPOLYGON (((-9.16432 38.73804, -9.16413 38... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | building=commercial | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
24110 rows × 18 columns
Join the objects with the regions they belong to¶
joiner = IntersectionJoiner()
joint_gdf = joiner.transform(regions_gdf, features_gdf)
joint_gdf
region_id | feature_id |
---|---|
89393362887ffff | relation/12883714 |
89393362e77ffff | relation/12883714 |
8939336059bffff | relation/12883714 |
8939336236fffff | relation/12883714 |
89393362ab3ffff | relation/12883714 |
... | ... |
8939337582fffff | way/1108508589 |
way/1108508588 | |
89393375eabffff | node/11222239582 |
way/1011346010 | |
node/11222184108 |
26979 rows × 0 columns
Embed using features existing in data¶
ContextualCountEmbedder
extends capabilities of basic CountEmbedder
by incorporating the neighbourhood of embedded region. In this example we will use the H3Neighbourhood
.
h3n = H3Neighbourhood()
Squashed vector version (default)¶
Embedder will return vector of the same length as CountEmbedder
, but will sum averaged values from the neighbourhoods diminished by the neighbour distance squared.
cce = ContextualCountEmbedder(
neighbourhood=h3n, neighbourhood_distance=10, concatenate_vectors=False
)
embeddings = cce.transform(regions_gdf, features_gdf, joint_gdf)
embeddings
aerialway | airports | sustenance | education | transportation | finances | healthcare | culture_art_entertainment | other | buildings | emergency | historic | leisure | shops | sport | tourism | greenery | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
region_id | |||||||||||||||||
89393362887ffff | 0.034722 | 0.000000 | 2.269353 | 0.439062 | 12.628285 | 0.180062 | 0.359510 | 0.074466 | 0.589209 | 1.515361 | 0.000461 | 1.630729 | 6.349431 | 2.680176 | 1.452460 | 5.003820 | 4.626360 |
893933675d7ffff | 0.009020 | 0.003138 | 0.443033 | 1.299662 | 6.234084 | 0.069550 | 0.125526 | 0.027383 | 0.169019 | 5.727367 | 0.012956 | 0.049820 | 2.483650 | 0.588432 | 1.651177 | 1.422769 | 3.066563 |
89393362b9bffff | 0.001426 | 0.000000 | 10.974734 | 1.568049 | 17.594330 | 0.982621 | 1.885864 | 0.224930 | 2.870345 | 1.671898 | 0.000000 | 2.278367 | 2.770693 | 10.478793 | 0.354697 | 1.492560 | 2.078291 |
89393375e7bffff | 0.001705 | 0.026483 | 2.899253 | 2.665356 | 18.173395 | 0.625617 | 2.551422 | 0.027626 | 2.189496 | 2.741207 | 0.003333 | 2.544245 | 2.332379 | 2.319567 | 0.830177 | 1.100986 | 4.822897 |
89393362b47ffff | 0.000000 | 0.000000 | 30.559139 | 1.494721 | 45.892861 | 3.045202 | 0.690918 | 5.290941 | 3.852949 | 4.658366 | 0.000000 | 5.691184 | 13.638597 | 6.009587 | 0.218136 | 6.490396 | 4.293890 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
89393375977ffff | 0.017440 | 0.016935 | 2.070630 | 0.642945 | 6.923506 | 0.193499 | 2.333959 | 0.018138 | 0.301379 | 1.338160 | 0.098571 | 0.087337 | 6.918314 | 3.433757 | 0.644636 | 0.737992 | 6.099323 |
8939337593bffff | 0.013735 | 0.015131 | 0.734981 | 3.348821 | 12.929066 | 0.127047 | 0.228607 | 0.015770 | 3.092170 | 2.171014 | 0.065808 | 0.075753 | 4.445317 | 0.977656 | 2.111079 | 2.514135 | 2.715795 |
89393362dc7ffff | 0.011944 | 0.002270 | 7.448524 | 1.502370 | 16.884160 | 2.471997 | 2.652232 | 0.049050 | 1.231567 | 2.819407 | 0.002593 | 1.178034 | 5.428327 | 3.798023 | 1.052855 | 0.655873 | 7.265484 |
8939337582fffff | 0.002772 | 0.112517 | 7.296012 | 3.329005 | 50.895200 | 2.044297 | 1.102103 | 0.061298 | 0.116477 | 0.203648 | 0.006098 | 2.120827 | 7.952413 | 4.359110 | 0.626138 | 0.167047 | 21.410751 |
89393375eabffff | 0.000000 | 0.049129 | 2.406643 | 0.637230 | 6.113690 | 0.168946 | 0.376918 | 0.093360 | 0.153909 | 0.163972 | 0.001145 | 0.194110 | 3.849913 | 0.677553 | 0.595808 | 0.247019 | 3.500803 |
830 rows × 17 columns
Concatenated vector version¶
Embedder will return vector of length n * distance
where n
is number of features from the CountEmbedder
and distance
is number of neighbourhoods analysed.
Each feature will be postfixed with _n
string, where n
is the current distance. Values are averaged from all neighbours.
wide_cce = ContextualCountEmbedder(
neighbourhood=h3n, neighbourhood_distance=10, concatenate_vectors=True
)
wide_embeddings = wide_cce.transform(regions_gdf, features_gdf, joint_gdf)
wide_embeddings
aerialway_0 | airports_0 | sustenance_0 | education_0 | transportation_0 | finances_0 | healthcare_0 | culture_art_entertainment_0 | other_0 | buildings_0 | ... | culture_art_entertainment_10 | other_10 | buildings_10 | emergency_10 | historic_10 | leisure_10 | shops_10 | sport_10 | tourism_10 | greenery_10 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
region_id | |||||||||||||||||||||
89393362887ffff | 0.0 | 0.0 | 1.0 | 0.0 | 8.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | ... | 0.340000 | 0.840000 | 1.400000 | 0.000000 | 1.060000 | 4.380000 | 4.360000 | 1.100000 | 2.060000 | 3.480000 |
893933675d7ffff | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 5.0 | ... | 0.172414 | 0.448276 | 0.551724 | 0.034483 | 0.379310 | 4.413793 | 4.827586 | 1.931034 | 0.862069 | 3.275862 |
89393362b9bffff | 0.0 | 0.0 | 6.0 | 1.0 | 10.0 | 0.0 | 1.0 | 0.0 | 2.0 | 0.0 | ... | 0.244444 | 0.844444 | 1.155556 | 0.000000 | 1.288889 | 4.177778 | 5.000000 | 1.177778 | 3.155556 | 4.822222 |
89393375e7bffff | 0.0 | 0.0 | 2.0 | 2.0 | 12.0 | 0.0 | 2.0 | 0.0 | 2.0 | 2.0 | ... | 0.200000 | 0.400000 | 0.925000 | 0.000000 | 0.600000 | 3.825000 | 3.050000 | 0.650000 | 1.825000 | 7.650000 |
89393362b47ffff | 0.0 | 0.0 | 23.0 | 1.0 | 39.0 | 2.0 | 0.0 | 5.0 | 3.0 | 3.0 | ... | 0.121212 | 0.696970 | 1.242424 | 0.000000 | 0.575758 | 3.303030 | 7.242424 | 0.787879 | 2.939394 | 4.181818 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
89393375977ffff | 0.0 | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 | 2.0 | 0.0 | 0.0 | 1.0 | ... | 0.000000 | 0.105263 | 0.842105 | 0.000000 | 0.000000 | 1.368421 | 0.157895 | 0.578947 | 0.157895 | 4.473684 |
8939337593bffff | 0.0 | 0.0 | 0.0 | 3.0 | 9.0 | 0.0 | 0.0 | 0.0 | 3.0 | 2.0 | ... | 0.000000 | 0.277778 | 0.833333 | 0.055556 | 0.055556 | 1.666667 | 0.833333 | 0.777778 | 0.500000 | 6.388889 |
89393362dc7ffff | 0.0 | 0.0 | 6.0 | 1.0 | 12.0 | 2.0 | 2.0 | 0.0 | 1.0 | 2.0 | ... | 0.282051 | 0.435897 | 2.153846 | 0.000000 | 0.794872 | 4.461538 | 6.230769 | 2.179487 | 2.153846 | 3.256410 |
8939337582fffff | 0.0 | 0.0 | 7.0 | 3.0 | 43.0 | 2.0 | 1.0 | 0.0 | 0.0 | 0.0 | ... | 0.064516 | 0.322581 | 0.838710 | 0.000000 | 0.225806 | 3.354839 | 2.451613 | 0.967742 | 1.096774 | 3.677419 |
89393375eabffff | 0.0 | 0.0 | 2.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.172414 | 0.517241 | 0.655172 | 0.000000 | 0.689655 | 5.862069 | 1.689655 | 1.862069 | 1.310345 | 11.241379 |
830 rows × 187 columns
Plotting example features¶
plot_numeric_data(regions_gdf, "leisure", embeddings)
plot_numeric_data(regions_gdf, "transportation", embeddings)