Contextual count embedder
from srai.loaders.osm_loaders import OSMPbfLoader
from srai.regionalizers import H3Regionalizer
from srai.joiners import IntersectionJoiner
from srai.embedders import ContextualCountEmbedder
from srai.plotting.folium_wrapper import plot_regions, plot_numeric_data
from srai.neighbourhoods import H3Neighbourhood
Data preparation¶
In order to use ContextualCountEmbedder
we need to prepare some data.
Namely we need: regions_gdf
, features_gdf
, and joint_gdf
.
These are the outputs of Regionalizers, Loaders and Joiners respectively.
from srai.regionalizers import geocode_to_region_gdf
area_gdf = geocode_to_region_gdf("Lisboa, PT")
plot_regions(area_gdf)
Regionalize the area using an H3Regionalizer¶
regionalizer = H3Regionalizer(resolution=9, buffer=True)
regions_gdf = regionalizer.transform(area_gdf)
regions_gdf
geometry | |
---|---|
region_id | |
89393362b9bffff | POLYGON ((-9.16462 38.72266, -9.16632 38.72132... |
893933664c3ffff | POLYGON ((-9.08758 38.79498, -9.08928 38.79364... |
893933628b7ffff | POLYGON ((-9.16908 38.74269, -9.17078 38.74135... |
89393362e37ffff | POLYGON ((-9.19246 38.72899, -9.19416 38.72765... |
8939336744bffff | POLYGON ((-9.12732 38.73318, -9.12903 38.73184... |
... | ... |
8939337585bffff | POLYGON ((-9.14318 38.77482, -9.14489 38.77348... |
8939336767bffff | POLYGON ((-9.12690 38.71234, -9.12860 38.71100... |
89393362b23ffff | POLYGON ((-9.14483 38.71470, -9.14654 38.71336... |
89393375c7bffff | POLYGON ((-9.15275 38.79083, -9.15446 38.78949... |
893933628c7ffff | POLYGON ((-9.18333 38.73383, -9.18503 38.73249... |
830 rows × 1 columns
Download some objects from OpenStreetMap¶
You can use both OsmTagsFilter
and GroupedOsmTagsFilter
filters. In this example, a predefined GroupedOsmTagsFilter
filter BASE_OSM_GROUPS_FILTER
is used.
from srai.loaders.osm_loaders.filters import BASE_OSM_GROUPS_FILTER
loader = OSMPbfLoader()
features_gdf = loader.load(area_gdf, tags=BASE_OSM_GROUPS_FILTER)
features_gdf
[Lisbon, Portugal] Downloading pbf file #1 (Elements): 100%|██████████| 1307840/1307840 [00:04<00:00, 301813.75it/s] ccfc4ec912ac803c97b939feba28e0a57de61e0e543e36e51c010f9f0a167e37.osm.pbf: 100%|██████████| 7.39M/7.39M [00:02<00:00, 3.59MiB/s] [Lisbon, Portugal] Counting pbf features: 778396it [00:03, 216749.00it/s] [Lisbon, Portugal] Parsing pbf file #1: 100%|██████████| 778396/778396 [00:18<00:00, 43182.55it/s] Grouping features: 100%|██████████| 18/18 [00:00<00:00, 24.65it/s]
geometry | aerialway | airports | sustenance | education | transportation | finances | healthcare | culture_art_entertainment | other | buildings | emergency | historic | leisure | shops | sport | tourism | greenery | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
feature_id | ||||||||||||||||||
node/21433772 | POINT (-9.19059 38.72880) | NaN | NaN | NaN | NaN | public_transport=stop_position | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
node/21433776 | POINT (-9.19376 38.72666) | NaN | NaN | NaN | NaN | public_transport=stop_position | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
node/25414208 | POINT (-9.16663 38.74018) | NaN | NaN | NaN | NaN | public_transport=stop_position | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
node/25414256 | POINT (-9.10286 38.74711) | NaN | NaN | NaN | NaN | public_transport=stop_position | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
node/25414265 | POINT (-9.10273 38.74707) | NaN | NaN | NaN | NaN | public_transport=stop_position | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
relation/8131598 | MULTIPOLYGON (((-9.14676 38.74328, -9.14662 38... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | leisure=garden | NaN | NaN | NaN | NaN |
relation/16158578 | MULTIPOLYGON (((-9.15193 38.72702, -9.15123 38... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | landuse=grass |
relation/16201238 | MULTIPOLYGON (((-9.13806 38.74037, -9.13809 38... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | amenity=place_of_worship | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
relation/7117461 | MULTIPOLYGON (((-9.13065 38.74680, -9.13036 38... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | leisure=park | NaN | NaN | NaN | leisure=park |
relation/16527291 | MULTIPOLYGON (((-9.16432 38.73804, -9.16413 38... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | building=commercial | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
24148 rows × 18 columns
Join the objects with the regions they belong to¶
joiner = IntersectionJoiner()
joint_gdf = joiner.transform(regions_gdf, features_gdf)
joint_gdf
region_id | feature_id |
---|---|
89393362b9bffff | node/9730267976 |
node/9730267977 | |
node/9730267978 | |
node/9730267979 | |
way/755768797 | |
... | ... |
89393362b23ffff | node/4285908290 |
node/2931441119 | |
node/1194934441 | |
89393375c7bffff | way/420183300 |
way/429399851 |
27026 rows × 0 columns
Embed using features existing in data¶
ContextualCountEmbedder
extends capabilities of basic CountEmbedder
by incorporating the neighbourhood of embedded region. In this example we will use the H3Neighbourhood
.
h3n = H3Neighbourhood()
Squashed vector version (default)¶
Embedder will return vector of the same length as CountEmbedder
, but will sum averaged values from the neighbourhoods diminished by the neighbour distance squared.
cce = ContextualCountEmbedder(
neighbourhood=h3n, neighbourhood_distance=10, concatenate_vectors=False
)
embeddings = cce.transform(regions_gdf, features_gdf, joint_gdf)
embeddings
aerialway | airports | sustenance | education | transportation | finances | healthcare | culture_art_entertainment | other | buildings | emergency | historic | leisure | shops | sport | tourism | greenery | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
region_id | |||||||||||||||||
89393362b9bffff | 0.001426 | 0.000000 | 10.975670 | 1.568049 | 17.608729 | 0.983188 | 1.885864 | 0.224930 | 2.870345 | 1.671898 | 0.000000 | 2.278367 | 2.770693 | 10.478976 | 0.354697 | 1.492560 | 2.078291 |
893933664c3ffff | 0.017925 | 0.001271 | 0.420144 | 0.179096 | 1.624246 | 0.067765 | 0.076258 | 0.010264 | 0.040404 | 0.090487 | 0.021654 | 0.018993 | 1.366620 | 0.475141 | 0.522997 | 0.237360 | 0.863500 |
893933628b7ffff | 0.143519 | 0.000138 | 5.281330 | 0.419329 | 70.580654 | 0.545770 | 1.543742 | 0.141519 | 1.463537 | 1.977067 | 0.000752 | 0.491895 | 3.090419 | 7.022000 | 1.622302 | 5.740091 | 26.502221 |
89393362e37ffff | 0.001426 | 0.000000 | 0.289239 | 0.096729 | 5.216950 | 0.041555 | 0.062309 | 0.031261 | 0.189253 | 0.109838 | 0.000000 | 1.090470 | 3.405066 | 0.337004 | 1.222539 | 1.115232 | 0.756340 |
8939336744bffff | 0.000000 | 0.000000 | 20.000527 | 0.560251 | 26.877968 | 2.647558 | 1.774219 | 1.131219 | 0.621619 | 0.736705 | 0.000230 | 0.419809 | 2.676785 | 33.010598 | 0.552928 | 1.520889 | 9.813354 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
8939337585bffff | 0.000000 | 1.388443 | 0.246986 | 0.221782 | 5.142120 | 0.065209 | 0.154147 | 0.020404 | 0.069686 | 0.135453 | 0.000799 | 0.054402 | 1.015141 | 0.286173 | 0.351061 | 0.187906 | 4.467682 |
8939336767bffff | 0.000000 | 0.000000 | 28.774025 | 0.254190 | 25.356870 | 4.147838 | 2.510051 | 0.424492 | 2.585152 | 4.501995 | 0.000000 | 6.798468 | 1.352781 | 13.698240 | 0.309200 | 12.353318 | 3.680504 |
89393362b23ffff | 0.000000 | 0.000000 | 147.848219 | 2.743022 | 21.734827 | 2.279263 | 2.155266 | 5.662260 | 4.002520 | 3.837253 | 0.000000 | 19.779530 | 4.130932 | 36.115390 | 0.264303 | 28.490555 | 3.196619 |
89393375c7bffff | 0.000000 | 0.060634 | 0.240982 | 1.258147 | 3.495978 | 0.043332 | 0.125474 | 0.019036 | 0.126361 | 1.161524 | 0.000344 | 0.148558 | 5.165163 | 1.619219 | 2.402199 | 0.156859 | 1.716225 |
893933628c7ffff | 0.003912 | 0.000000 | 0.518926 | 0.132009 | 1.846715 | 0.082485 | 0.109140 | 0.026688 | 0.140199 | 0.178526 | 0.000000 | 0.219968 | 2.500760 | 0.625300 | 0.312026 | 1.240764 | 0.874453 |
830 rows × 17 columns
Concatenated vector version¶
Embedder will return vector of length n * distance
where n
is number of features from the CountEmbedder
and distance
is number of neighbourhoods analysed.
Each feature will be postfixed with _n
string, where n
is the current distance. Values are averaged from all neighbours.
wide_cce = ContextualCountEmbedder(
neighbourhood=h3n, neighbourhood_distance=10, concatenate_vectors=True
)
wide_embeddings = wide_cce.transform(regions_gdf, features_gdf, joint_gdf)
wide_embeddings
aerialway_0 | airports_0 | sustenance_0 | education_0 | transportation_0 | finances_0 | healthcare_0 | culture_art_entertainment_0 | other_0 | buildings_0 | ... | culture_art_entertainment_10 | other_10 | buildings_10 | emergency_10 | historic_10 | leisure_10 | shops_10 | sport_10 | tourism_10 | greenery_10 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
region_id | |||||||||||||||||||||
89393362b9bffff | 0.0 | 0.0 | 6.0 | 1.0 | 10.0 | 0.0 | 1.0 | 0.0 | 2.0 | 0.0 | ... | 0.244444 | 0.844444 | 1.155556 | 0.000000 | 1.288889 | 4.177778 | 5.022222 | 1.177778 | 3.155556 | 4.822222 |
893933664c3ffff | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.153846 | 0.461538 | 0.615385 | 0.000000 | 0.538462 | 3.384615 | 0.692308 | 0.923077 | 1.384615 | 11.692308 |
893933628b7ffff | 0.0 | 0.0 | 3.0 | 0.0 | 64.0 | 0.0 | 1.0 | 0.0 | 1.0 | 1.0 | ... | 0.200000 | 0.950000 | 1.216667 | 0.000000 | 1.166667 | 3.466667 | 5.833333 | 0.800000 | 2.583333 | 3.866667 |
89393362e37ffff | 0.0 | 0.0 | 0.0 | 0.0 | 3.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.222222 | 0.822222 | 2.555556 | 0.000000 | 1.555556 | 6.288889 | 2.511111 | 1.044444 | 3.177778 | 4.511111 |
8939336744bffff | 0.0 | 0.0 | 16.0 | 0.0 | 20.0 | 2.0 | 1.0 | 1.0 | 0.0 | 0.0 | ... | 0.305556 | 0.666667 | 1.666667 | 0.027778 | 0.972222 | 3.305556 | 5.305556 | 0.638889 | 2.000000 | 7.083333 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
8939337585bffff | 0.0 | 1.0 | 0.0 | 0.0 | 3.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.088235 | 0.264706 | 1.000000 | 0.000000 | 0.176471 | 3.147059 | 2.029412 | 1.147059 | 0.764706 | 5.029412 |
8939336767bffff | 0.0 | 0.0 | 16.0 | 0.0 | 17.0 | 3.0 | 2.0 | 0.0 | 1.0 | 3.0 | ... | 0.307692 | 1.000000 | 2.692308 | 0.000000 | 1.500000 | 5.269231 | 5.115385 | 0.846154 | 2.307692 | 3.692308 |
89393362b23ffff | 0.0 | 0.0 | 129.0 | 2.0 | 11.0 | 0.0 | 1.0 | 5.0 | 2.0 | 1.0 | ... | 0.212121 | 0.393939 | 1.454545 | 0.000000 | 0.424242 | 3.181818 | 3.151515 | 0.878788 | 1.606061 | 3.545455 |
89393375c7bffff | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | ... | 0.083333 | 0.291667 | 1.208333 | 0.041667 | 0.458333 | 4.291667 | 1.083333 | 1.375000 | 0.500000 | 8.083333 |
893933628c7ffff | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.625000 | 1.062500 | 2.875000 | 0.000000 | 1.937500 | 5.229167 | 7.062500 | 1.458333 | 3.875000 | 8.541667 |
830 rows × 187 columns
Plotting example features¶
plot_numeric_data(regions_gdf, "leisure", embeddings)
plot_numeric_data(regions_gdf, "transportation", embeddings)