Wide format¶
OvertureMaestro implements a logic for transforming downloaded data into a wide format. This format is dedicated for geospatial machine learning usage, where selected datasets are pivoted based on their categories to a columnar format.
This notebook will explore what is this format and how to work with it.
New functions¶
New module contains the same set of functions as the basic api, just with the wide_form part inside:
convert_geometry_to_parquet→convert_geometry_to_wide_form_parquetconvert_geometry_to_geodataframe→convert_geometry_to_wide_form_geodataframe- other functions ...
Additionally, special functions for downloading all available datasets are available:
convert_geometry_to_wide_form_parquet_for_all_typesconvert_geometry_to_wide_form_geodataframe_for_all_typesconvert_bounding_box_to_wide_form_parquet_for_all_typesconvert_bounding_box_to_wide_form_geodataframe_for_all_types
You can import them from the overturemaestro.advanced_functions module.
from overturemaestro import convert_geometry_to_geodataframe, geocode_to_geometry
from overturemaestro.advanced_functions import convert_geometry_to_wide_form_geodataframe
What is the wide format?¶
In this section we will compare how the original data format differs from the wide format based on water data.
Let's start by looking at the official Overture Maps schema for the base water data type:
import requests
import yaml
response = requests.get(
"https://raw.githubusercontent.com/OvertureMaps/schema/refs/tags/v1.4.0/schema/base/water.yaml",
allow_redirects=True,
)
water_schema = yaml.safe_load(response.content.decode("utf-8"))
water_schema
{'$schema': 'https://json-schema.org/draft/2020-12/schema',
'title': 'water',
'description': 'Physical representations of inland and ocean marine surfaces. Translates `natural` and `waterway` tags from OpenStreetMap.',
'type': 'object',
'properties': {'id': {'$ref': '../defs.yaml#/$defs/propertyDefinitions/id'},
'geometry': {'unevaluatedProperties': False,
'oneOf': [{'$ref': 'https://geojson.org/schema/Point.json'},
{'$ref': 'https://geojson.org/schema/LineString.json'},
{'$ref': 'https://geojson.org/schema/Polygon.json'},
{'$ref': 'https://geojson.org/schema/MultiPolygon.json'}]},
'properties': {'unevaluatedProperties': False,
'allOf': [{'$ref': '../defs.yaml#/$defs/propertyContainers/overtureFeaturePropertiesContainer'},
{'$ref': '../defs.yaml#/$defs/propertyContainers/levelContainer'},
{'$ref': '../defs.yaml#/$defs/propertyContainers/namesContainer'},
{'$ref': './defs.yaml#/$defs/propertyContainers/osmPropertiesContainer'}],
'required': ['subtype', 'class'],
'properties': {'subtype': {'description': 'The type of water body such as an river, ocean or lake.',
'default': ['water'],
'type': 'string',
'enum': ['canal',
'human_made',
'lake',
'ocean',
'physical',
'pond',
'reservoir',
'river',
'spring',
'stream',
'wastewater',
'water']},
'class': {'description': 'Further description of the type of water',
'default': ['water'],
'enum': ['basin',
'bay',
'blowhole',
'canal',
'cape',
'ditch',
'dock',
'drain',
'fairway',
'fish_pass',
'fishpond',
'geyser',
'hot_spring',
'lagoon',
'lake',
'moat',
'ocean',
'oxbow',
'pond',
'reflecting_pool',
'reservoir',
'river',
'salt_pond',
'sea',
'sewage',
'shoal',
'spring',
'strait',
'stream',
'swimming_pool',
'tidal_channel',
'wastewater',
'water',
'water_storage',
'waterfall']},
'is_salt': {'description': 'Is it salt water or not', 'type': 'boolean'},
'is_intermittent': {'description': 'Is it intermittent water or not',
'type': 'boolean'}}}}}
Two required fields are defined in the specification: subtype and class. There are even lists of possible values defined.
Both of these values detail the meaning of the feature. Together, everything maps to the path:
theme (base) → type (water) → subtype (eg. reservoir) → class (eg. basin).
Based on this hierarchy, all available values can be determined and mapped to columns.
In this way, you will obtain data in a wide format, where each feature defines what it is with boolean flags.
amsterdam = geocode_to_geometry("Amsterdam")
original_data = convert_geometry_to_geodataframe("base", "water", amsterdam)
wide_data = convert_geometry_to_wide_form_geodataframe("base", "water", amsterdam)
Finished operation in 0:00:08
Finished operation in 0:00:10
original_data
| geometry | bbox | version | sources | level | subtype | class | names | source_tags | wikidata | is_salt | is_intermittent | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| id | ||||||||||||
| 3e3314a0-c979-366b-94ed-16c0bafd3bd5 | POLYGON ((-72.99934 40.68825, -72.99967 40.688... | {'xmin': -74.00066375732422, 'xmax': -72.99933... | 0 | [{'property': '', 'dataset': 'OpenStreetMap', ... | NaN | ocean | ocean | {'primary': None, 'common': None, 'rules': None} | None | None | True | None |
| 7198db90-6b71-3792-8947-b581d6c5b5c4 | POLYGON ((-73.72785 40.66063, -73.72784 40.660... | {'xmin': -73.7278823852539, 'xmax': -73.727821... | 0 | [{'property': '', 'dataset': 'OpenStreetMap', ... | NaN | human_made | swimming_pool | {'primary': None, 'common': None, 'rules': None} | [(access, private), (leisure, swimming_pool)] | None | None | None |
| d3a3c77d-8078-3d98-b7b7-323323bacf44 | LINESTRING (-73.72846 40.66632, -73.72851 40.6... | {'xmin': -73.728515625, 'xmax': -73.7254257202... | 0 | [{'property': '', 'dataset': 'OpenStreetMap', ... | -1.0 | canal | drain | {'primary': 'Hook Creek', 'common': None, 'rul... | [(tunnel, culvert), (waterway, drain)] | None | None | None |
| 722d3468-4f50-37a7-b809-1f0c1e7814ab | POLYGON ((-73.72678 40.65868, -73.72682 40.658... | {'xmin': -73.7268295288086, 'xmax': -73.726760... | 0 | [{'property': '', 'dataset': 'OpenStreetMap', ... | NaN | human_made | swimming_pool | {'primary': None, 'common': None, 'rules': None} | [(access, private), (leisure, swimming_pool)] | None | None | None |
| 61f8df08-22b7-309b-a39d-1e27ffdea4c1 | POLYGON ((-73.72903 40.66007, -73.72902 40.660... | {'xmin': -73.72909545898438, 'xmax': -73.72901... | 0 | [{'property': '', 'dataset': 'OpenStreetMap', ... | NaN | human_made | swimming_pool | {'primary': None, 'common': None, 'rules': None} | [(access, private), (leisure, swimming_pool)] | None | None | None |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 1b55cc6a-4875-35bb-973c-7f78107e9349 | POLYGON ((-73.81194 40.88809, -73.81196 40.888... | {'xmin': -73.81196594238281, 'xmax': -73.81179... | 0 | [{'property': '', 'dataset': 'OpenStreetMap', ... | NaN | human_made | swimming_pool | {'primary': None, 'common': None, 'rules': None} | [(access, private), (leisure, swimming_pool), ... | None | None | None |
| 710cc072-2f90-3f24-847d-d994d2ff23e2 | POLYGON ((-73.91045 40.91526, -73.91018 40.915... | {'xmin': -73.94308471679688, 'xmax': -73.88537... | 0 | [{'property': '', 'dataset': 'OpenStreetMap', ... | NaN | river | river | {'primary': None, 'common': None, 'rules': None} | [(intermittent, no), (natural, water), (source... | None | None | False |
| 79dced11-730d-34b5-bb9c-803ddfeb40f0 | LINESTRING (-73.88916 41.04345, -73.89008 41.0... | {'xmin': -73.93392944335938, 'xmax': -73.88914... | 0 | [{'property': '', 'dataset': 'OpenStreetMap', ... | NaN | river | river | {'primary': 'Hudson River', 'common': [('es', ... | [(boat, yes), (canoe, yes), (canoe:description... | None | None | False |
| e0ef7fc9-ba8a-3da1-b5c9-775faf05dca3 | POLYGON ((-73.64509 41.0005, -73.64511 41.0004... | {'xmin': -74.00066375732422, 'xmax': -72.99933... | 0 | [{'property': '', 'dataset': 'OpenStreetMap', ... | NaN | ocean | ocean | {'primary': None, 'common': None, 'rules': None} | None | None | True | None |
| 6b67cfae-059c-387b-9391-14e8d4946a7b | POLYGON ((-72.23316 41.16052, -72.2331 41.1604... | {'xmin': -73.80953979492188, 'xmax': -71.85736... | 0 | [{'property': '', 'dataset': 'OpenStreetMap', ... | NaN | physical | bay | {'primary': 'Long Island Sound', 'common': Non... | [(ele, 0), (gnis:feature_id, 977427), (natural... | Q867460 | None | None |
49065 rows × 12 columns
wide_data
| geometry | base|water|canal|canal | base|water|canal|ditch | base|water|canal|drain | base|water|canal|moat | base|water|human_made|fish_pass | base|water|human_made|reflecting_pool | base|water|human_made|salt_pond | base|water|human_made|swimming_pool | base|water|lake|lagoon | ... | base|water|spring|geyser | base|water|spring|hot_spring | base|water|spring|spring | base|water|stream|stream | base|water|wastewater|sewage | base|water|water|dock | base|water|water|fairway | base|water|water|tidal_channel | base|water|water|wastewater | base|water|water|water | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| id | |||||||||||||||||||||
| 3e3314a0-c979-366b-94ed-16c0bafd3bd5 | POLYGON ((-72.99934 40.68825, -72.99967 40.688... | False | False | False | False | False | False | False | False | False | ... | False | False | False | False | False | False | False | False | False | False |
| 7198db90-6b71-3792-8947-b581d6c5b5c4 | POLYGON ((-73.72785 40.66063, -73.72784 40.660... | False | False | False | False | False | False | False | True | False | ... | False | False | False | False | False | False | False | False | False | False |
| d3a3c77d-8078-3d98-b7b7-323323bacf44 | LINESTRING (-73.72846 40.66632, -73.72851 40.6... | False | False | True | False | False | False | False | False | False | ... | False | False | False | False | False | False | False | False | False | False |
| 722d3468-4f50-37a7-b809-1f0c1e7814ab | POLYGON ((-73.72678 40.65868, -73.72682 40.658... | False | False | False | False | False | False | False | True | False | ... | False | False | False | False | False | False | False | False | False | False |
| 61f8df08-22b7-309b-a39d-1e27ffdea4c1 | POLYGON ((-73.72903 40.66007, -73.72902 40.660... | False | False | False | False | False | False | False | True | False | ... | False | False | False | False | False | False | False | False | False | False |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 1b55cc6a-4875-35bb-973c-7f78107e9349 | POLYGON ((-73.81194 40.88809, -73.81196 40.888... | False | False | False | False | False | False | False | True | False | ... | False | False | False | False | False | False | False | False | False | False |
| 710cc072-2f90-3f24-847d-d994d2ff23e2 | POLYGON ((-73.91045 40.91526, -73.91018 40.915... | False | False | False | False | False | False | False | False | False | ... | False | False | False | False | False | False | False | False | False | False |
| 79dced11-730d-34b5-bb9c-803ddfeb40f0 | LINESTRING (-73.88916 41.04345, -73.89008 41.0... | False | False | False | False | False | False | False | False | False | ... | False | False | False | False | False | False | False | False | False | False |
| e0ef7fc9-ba8a-3da1-b5c9-775faf05dca3 | POLYGON ((-73.64509 41.0005, -73.64511 41.0004... | False | False | False | False | False | False | False | False | False | ... | False | False | False | False | False | False | False | False | False | False |
| 6b67cfae-059c-387b-9391-14e8d4946a7b | POLYGON ((-72.23316 41.16052, -72.2331 41.1604... | False | False | False | False | False | False | False | False | False | ... | False | False | False | False | False | False | False | False | False | False |
49065 rows × 37 columns
Using this format, we can quickly filter out data or calculate number of features per category.
wide_data.drop(columns="geometry").sum().sort_values(ascending=False)
base|water|human_made|swimming_pool 46824 base|water|stream|stream 665 base|water|water|water 498 base|water|pond|pond 352 base|water|river|river 157 base|water|water|wastewater 123 base|water|canal|ditch 110 base|water|reservoir|basin 74 base|water|water|tidal_channel 65 base|water|physical|bay 53 base|water|canal|drain 37 base|water|physical|cape 30 base|water|water|fairway 14 base|water|canal|canal 14 base|water|physical|waterfall 11 base|water|physical|shoal 9 base|water|ocean|ocean 9 base|water|lake|lake 6 base|water|reservoir|reservoir 6 base|water|human_made|reflecting_pool 4 base|water|spring|spring 2 base|water|physical|strait 1 base|water|human_made|fish_pass 1 base|water|canal|moat 0 base|water|water|dock 0 base|water|wastewater|sewage 0 base|water|human_made|salt_pond 0 base|water|physical|ocean 0 base|water|spring|hot_spring 0 base|water|spring|geyser 0 base|water|physical|sea 0 base|water|lake|lagoon 0 base|water|reservoir|water_storage 0 base|water|lake|oxbow 0 base|water|pond|fishpond 0 base|water|spring|blowhole 0 dtype: int64
Each theme type has defined list of columns used for generating final list of columns.
Most of the datasets have two columns (subtype and class) with three exceptions:
base|land_cover→subtypeonlytransportation|segment→subtype,classandsubclassplaces|place→1,2,3, ... (this one is described in detail below)
from overturemaestro.advanced_functions.wide_form import THEME_TYPE_CLASSIFICATION
for (theme_value, type_value), definition in sorted(THEME_TYPE_CLASSIFICATION.items()):
print(theme_value, type_value, definition.hierachy_columns)
base infrastructure ['subtype', 'class'] base land ['subtype', 'class'] base land_cover ['subtype'] base land_use ['subtype', 'class'] base water ['subtype', 'class'] buildings building ['subtype', 'class'] places place ['1', '2', '3', '4', '5', '6'] transportation segment ['subtype', 'class', 'subclass']
Multiple data types¶
You can also download data for multiple data theme/types at once, or even download all at once.
If some datasets have been downloaded during previous executions, then only missing data is downloaded.
Here we will look at the top 10 most common features for both examples.
from overturemaestro.advanced_functions import (
convert_geometry_to_wide_form_geodataframe_for_all_types,
convert_geometry_to_wide_form_geodataframe_for_multiple_types,
)
two_datasets_gdf = convert_geometry_to_wide_form_geodataframe_for_multiple_types(
[("base", "water"), ("base", "land_cover")], amsterdam
)
two_datasets_gdf.drop(columns="geometry").sum().sort_values(ascending=False).head(10)
Finished operation in 0:00:09
base|water|human_made|swimming_pool 46824 base|water|stream|stream 665 base|land_cover|shrub 505 base|water|water|water 498 base|land_cover|barren 475 base|land_cover|forest 442 base|water|pond|pond 352 base|water|river|river 157 base|land_cover|wetland 129 base|water|water|wastewater 123 dtype: int64
len(two_datasets_gdf.columns)
47
all_datasets_gdf = convert_geometry_to_wide_form_geodataframe_for_all_types(
amsterdam, sort_result=False # we skip sorting the result here for faster execution
)
all_datasets_gdf.drop(columns="geometry").sum().sort_values(ascending=False).head(10)
Finished operation in 0:01:48
buildings|building 818371 base|infrastructure|barrier|kerb 123967 base|infrastructure|transportation|crossing 109622 buildings|building|residential|garage 103650 base|infrastructure|transit|parking_space 87404 transportation|segment|road|footway 68119 buildings|building|residential|detached 68058 transportation|segment|road|residential 59596 base|land|tree|tree 53675 transportation|segment|road|footway|sidewalk 51870 dtype: int64
len(all_datasets_gdf.columns)
2641
Limiting hierarchy depth¶
If for some reason you want to only have higher level aggregation of the data, you can limit the hierarchy depth of the data.
By default full hierarchy is used to generate the columns.
Note
If you pass too high value, it will be automatically capped to the highest possible for a given theme/type pair.
limited_depth_water_gdf = convert_geometry_to_wide_form_geodataframe(
"base", "water", amsterdam, hierarchy_depth=1
)
limited_depth_water_gdf.drop(columns="geometry").sum()
Finished operation in 0:00:13
base|water|canal 161 base|water|human_made 46829 base|water|lake 6 base|water|ocean 9 base|water|physical 104 base|water|pond 352 base|water|reservoir 80 base|water|river 157 base|water|spring 2 base|water|stream 665 base|water|wastewater 0 base|water|water 700 dtype: int64
Using value of 0 will result in just a list of theme/type pairs.
limited_depth_all_gdf = convert_geometry_to_wide_form_geodataframe_for_all_types(
amsterdam, hierarchy_depth=0
)
limited_depth_all_gdf.drop(columns="geometry").sum()
Finished operation in 0:00:45
base|infrastructure 504296 base|land 60141 base|land_cover 1714 base|land_use 43099 base|water 49065 buildings|building 1091533 places|place 186017 transportation|segment 299210 dtype: int64
You can also pass a list if you are downloading data for multiple datasets at once. The list of values must be the same length as a list of theme_type_pairs.
limited_depth_multiple_gdf = convert_geometry_to_wide_form_geodataframe_for_multiple_types(
[("places", "place"), ("base", "land_cover"), ("base", "water")],
amsterdam,
hierarchy_depth=[1, None, 0],
)
limited_depth_multiple_gdf.drop(columns="geometry").sum()
Finished operation in 0:00:11
base|land_cover|barren 475 base|land_cover|crop 58 base|land_cover|forest 442 base|land_cover|grass 0 base|land_cover|mangrove 0 base|land_cover|moss 0 base|land_cover|shrub 505 base|land_cover|snow 0 base|land_cover|urban 105 base|land_cover|wetland 129 base|water 49065 places|place|accommodation 2972 places|place|active_life 7329 places|place|arts_and_entertainment 7356 places|place|attractions_and_activities 9088 places|place|automotive 5605 places|place|beauty_and_spa 14496 places|place|business_to_business 11382 places|place|eat_and_drink 32622 places|place|education 9636 places|place|financial_service 9988 places|place|health_and_medical 25644 places|place|home_service 11005 places|place|mass_media 3238 places|place|pets 1342 places|place|private_establishments_and_corporates 522 places|place|professional_services 43519 places|place|public_service_and_government 14427 places|place|real_estate 8372 places|place|religious_organization 6091 places|place|retail 45736 places|place|structure_and_geography 666 places|place|travel 5484 dtype: int64
Places¶
Places data have different schema than other datasets and it's the only one with possible multiple categories at once: primary and optional multiple alternative.
This structure is preserved in the wide format and it's the only dataset where a single feature can have multiple True values in the columns.
OvertureMaestro utilizes the categories column with primary and alternate sub-fields to get feature categorization. The hierarchy depth of 6 is based on official taxonomy of the possible categories.
There are two pyarrow filters applied automatically when downloading the data for the wide format: confidence value >= 0.75 and categories cannot be empty.
import pyarrow.compute as pc
category_not_null_filter = pc.invert(pc.field("categories").is_null())
minimal_confidence_filter = pc.field("confidence") >= pc.scalar(0.75)
combined_filter = category_not_null_filter & minimal_confidence_filter
original_places_data = convert_geometry_to_geodataframe(
"places",
"place",
amsterdam,
pyarrow_filter=combined_filter,
columns_to_download=["id", "geometry", "categories", "confidence"],
)
original_places_data
Finished operation in 0:00:08
| geometry | categories | confidence | |
|---|---|---|---|
| id | |||
| d1d9bdd1-c030-40b1-9ae9-84935d0fd1fb | POINT (-74.25304 40.48667) | {'primary': 'lighthouse', 'alternate': ['landm... | 0.939995 |
| 3a528566-cbfb-410f-ac23-07cf529fdf43 | POINT (-74.23893 40.49926) | {'primary': 'italian_restaurant', 'alternate':... | 0.950394 |
| 7290c346-53f6-494c-bd47-536260cd0356 | POINT (-74.24486 40.49938) | {'primary': 'park', 'alternate': ['playground'... | 0.995064 |
| 5023af81-eac2-415c-bea3-17dd86566f4e | POINT (-74.25182 40.499) | {'primary': 'landmark_and_historical_building'... | 0.939995 |
| dd5c25f1-e8d5-4379-a0d2-3eb116ac82e4 | POINT (-74.25322 40.5032) | {'primary': 'history_museum', 'alternate': ['m... | 0.995064 |
| ... | ... | ... | ... |
| 86f5dc3e-b1e4-44d7-9f1b-5fd07bd16aba | POINT (-73.76206 40.59256) | {'primary': 'amusement_park', 'alternate': ['b... | 0.770000 |
| e0b67409-7469-477d-9485-caae3b4b7a55 | POINT (-73.75876 40.59279) | {'primary': 'landmark_and_historical_building'... | 0.978538 |
| c1fe438b-20d5-45a6-bbaa-7ff00f9e75eb | POINT (-73.75774 40.59271) | {'primary': 'garbage_collection_service', 'alt... | 0.770000 |
| bf1c92a6-5a6d-4fe5-8ba0-d913eac86edb | POINT (-73.75544 40.59341) | {'primary': 'public_relations', 'alternate': [... | 0.894118 |
| dde201d3-ed19-4d3d-af46-0e6a7098d17e | POINT (-73.75333 40.59327) | {'primary': 'social_service_organizations', 'a... | 0.978538 |
186017 rows × 3 columns
first_index = (
# Find first object with at least one alternate category
original_places_data[original_places_data.categories.str.get("alternate").str.len() > 1]
.iloc[0]
.name
)
first_index, original_places_data.loc[first_index].categories
('d1d9bdd1-c030-40b1-9ae9-84935d0fd1fb',
{'primary': 'lighthouse',
'alternate': array(['landmark_and_historical_building', 'museum'], dtype=object)})
wide_form_places_data = convert_geometry_to_wide_form_geodataframe("places", "place", amsterdam)
wide_form_places_data
Finished operation in 0:00:58
| geometry | places|place|accommodation | places|place|accommodation|bed_and_breakfast | places|place|accommodation|cabin | places|place|accommodation|campground | places|place|accommodation|cottage | places|place|accommodation|guest_house | places|place|accommodation|health_retreats | places|place|accommodation|holiday_rental_home | places|place|accommodation|hostel | ... | places|place|travel|transportation|transport_interchange | places|place|travel|transportation|water_taxi | places|place|travel|travel_services | places|place|travel|travel_services|luggage_storage | places|place|travel|travel_services|passport_and_visa_services | places|place|travel|travel_services|passport_and_visa_services|visa_agent | places|place|travel|travel_services|travel_agents | places|place|travel|travel_services|travel_agents|sightseeing_tour_agency | places|place|travel|travel_services|visitor_center | places|place|travel|vacation_rental_agents | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| id | |||||||||||||||||||||
| d1d9bdd1-c030-40b1-9ae9-84935d0fd1fb | POINT (-74.25304 40.48667) | False | False | False | False | False | False | False | False | False | ... | False | False | False | False | False | False | False | False | False | False |
| 3a528566-cbfb-410f-ac23-07cf529fdf43 | POINT (-74.23893 40.49926) | False | False | False | False | False | False | False | False | False | ... | False | False | False | False | False | False | False | False | False | False |
| 7290c346-53f6-494c-bd47-536260cd0356 | POINT (-74.24486 40.49938) | False | False | False | False | False | False | False | False | False | ... | False | False | False | False | False | False | False | False | False | False |
| 5023af81-eac2-415c-bea3-17dd86566f4e | POINT (-74.25182 40.499) | False | False | False | False | False | False | False | False | False | ... | False | False | False | False | False | False | False | False | False | False |
| dd5c25f1-e8d5-4379-a0d2-3eb116ac82e4 | POINT (-74.25322 40.5032) | False | False | False | False | False | False | False | False | False | ... | False | False | False | False | False | False | False | False | False | False |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 86f5dc3e-b1e4-44d7-9f1b-5fd07bd16aba | POINT (-73.76206 40.59256) | False | False | False | False | False | False | False | False | False | ... | False | False | False | False | False | False | False | False | False | False |
| e0b67409-7469-477d-9485-caae3b4b7a55 | POINT (-73.75876 40.59279) | False | False | False | False | False | False | False | False | False | ... | False | False | False | False | False | False | False | False | False | False |
| c1fe438b-20d5-45a6-bbaa-7ff00f9e75eb | POINT (-73.75774 40.59271) | False | False | False | False | False | False | False | False | False | ... | False | False | False | False | False | False | False | False | False | False |
| bf1c92a6-5a6d-4fe5-8ba0-d913eac86edb | POINT (-73.75544 40.59341) | False | False | False | False | False | False | False | False | False | ... | False | False | False | False | False | False | False | False | False | False |
| dde201d3-ed19-4d3d-af46-0e6a7098d17e | POINT (-73.75333 40.59327) | False | False | False | False | False | False | False | False | False | ... | False | False | False | False | False | False | False | False | False | False |
186017 rows × 2117 columns
As you can see, only those features existing in the categories column are True and the rest is False.
wide_form_places_data.loc[first_index].drop("geometry").sort_values(ascending=False)
places|place|attractions_and_activities|landmark_and_historical_building True
places|place|attractions_and_activities|museum True
places|place|attractions_and_activities|lighthouse True
places|place|pets|pet_services|farrier_services False
places|place|professional_services|emergency_service False
...
places|place|eat_and_drink|bar|milkshake_bar False
places|place|eat_and_drink|bar|milk_bar False
places|place|eat_and_drink|bar|lounge False
places|place|eat_and_drink|bar|kombucha False
places|place|travel|vacation_rental_agents False
Name: d1d9bdd1-c030-40b1-9ae9-84935d0fd1fb, Length: 2116, dtype: object
You can use places_use_primary_category_only to use only single category per feature without altenatives.
primary_only_wide_form_places_data = convert_geometry_to_wide_form_geodataframe(
"places",
"place",
amsterdam,
places_use_primary_category_only=True,
)
primary_only_wide_form_places_data.loc[first_index].drop("geometry").sort_values(ascending=False)
Finished operation in 0:00:12
places|place|attractions_and_activities|lighthouse True
places|place|professional_services|construction_services|stone_and_masonry|masonry_contractors False
places|place|professional_services|electrical_consultant False
places|place|professional_services|elder_care_planning False
places|place|professional_services|editorial_services False
...
places|place|eat_and_drink|bar|lounge False
places|place|eat_and_drink|bar|kombucha False
places|place|eat_and_drink|bar|irish_pub False
places|place|eat_and_drink|bar|hotel_bar False
places|place|travel|vacation_rental_agents False
Name: d1d9bdd1-c030-40b1-9ae9-84935d0fd1fb, Length: 2116, dtype: object
Below you can see the difference in the counts of True values across all columns.
wide_form_places_data.drop(columns="geometry").sum().sort_values(ascending=False)
places|place|professional_services 20659
places|place|eat_and_drink|restaurant 17973
places|place|health_and_medical 14241
places|place|beauty_and_spa|beauty_salon 9406
places|place|beauty_and_spa 9027
...
places|place|eat_and_drink|restaurant|fish_restaurant 0
places|place|real_estate|kitchen_incubator 0
places|place|real_estate|housing_cooperative 0
places|place|eat_and_drink|restaurant|guamanian_restaurant 0
places|place|retail|food|coffee_and_tea_supplies 0
Length: 2116, dtype: int64
primary_only_wide_form_places_data.drop(columns="geometry").sum().sort_values(ascending=False)
places|place|professional_services 4690
places|place|religious_organization|church_cathedral 3080
places|place|beauty_and_spa|beauty_salon 3000
places|place|public_service_and_government|community_services 2816
places|place|health_and_medical|dentist 2285
...
places|place|active_life|sports_and_recreation_venue|futsal_field 0
places|place|eat_and_drink|bar|milkshake_bar 0
places|place|professional_services|product_design 0
places|place|eat_and_drink|bar|milk_bar 0
places|place|arts_and_entertainment|stadium_arena|tennis_stadium 0
Length: 2116, dtype: int64
You can also change the minimal confidence value with places_minimal_confidence parameter.
convert_geometry_to_wide_form_geodataframe(
"places", "place", amsterdam, places_minimal_confidence=0.95
)
Finished operation in 0:00:59
| geometry | places|place|accommodation | places|place|accommodation|bed_and_breakfast | places|place|accommodation|cabin | places|place|accommodation|campground | places|place|accommodation|cottage | places|place|accommodation|guest_house | places|place|accommodation|health_retreats | places|place|accommodation|holiday_rental_home | places|place|accommodation|hostel | ... | places|place|travel|transportation|transport_interchange | places|place|travel|transportation|water_taxi | places|place|travel|travel_services | places|place|travel|travel_services|luggage_storage | places|place|travel|travel_services|passport_and_visa_services | places|place|travel|travel_services|passport_and_visa_services|visa_agent | places|place|travel|travel_services|travel_agents | places|place|travel|travel_services|travel_agents|sightseeing_tour_agency | places|place|travel|travel_services|visitor_center | places|place|travel|vacation_rental_agents | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| id | |||||||||||||||||||||
| 8f12e63e-1c93-4c60-a7bc-555d801104c3 | POINT (-74.25125 40.50058) | False | False | False | False | False | False | False | False | False | ... | False | False | False | False | False | False | False | False | False | False |
| dd5c25f1-e8d5-4379-a0d2-3eb116ac82e4 | POINT (-74.25322 40.5032) | False | False | False | False | False | False | False | False | False | ... | False | False | False | False | False | False | False | False | False | False |
| e011bf43-1757-4b53-af9c-13ca7cb715d0 | POINT (-74.25416 40.50543) | False | False | False | False | False | False | False | False | False | ... | False | False | False | False | False | False | False | False | False | False |
| f541d0aa-d4bf-42a2-90e2-6bde6f3e3271 | POINT (-74.25427 40.50549) | False | False | False | False | False | False | False | False | False | ... | False | False | False | False | False | False | False | False | False | False |
| 7290c346-53f6-494c-bd47-536260cd0356 | POINT (-74.24486 40.49938) | False | False | False | False | False | False | False | False | False | ... | False | False | False | False | False | False | False | False | False | False |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| a76f8690-d0ec-481d-a688-070ea8606c66 | POINT (-73.74031 40.59847) | False | False | False | False | False | False | False | False | False | ... | False | False | False | False | False | False | False | False | False | False |
| 7f421a62-b7d1-467d-813f-1c6bc9b56d28 | POINT (-73.73994 40.59916) | False | False | False | False | False | False | False | False | False | ... | False | False | False | False | False | False | False | False | False | False |
| a14f3cd0-961c-4c19-b4dd-ca802ebae349 | POINT (-73.74217 40.59955) | False | False | False | False | False | False | False | False | False | ... | False | False | False | False | False | False | False | False | False | False |
| f44ebbb1-bff7-41a3-b5b1-06a67944602a | POINT (-73.74159 40.59542) | False | False | False | False | False | False | False | False | False | ... | False | False | False | False | False | False | False | False | False | False |
| 60fd5f49-19b3-4c7c-a825-5026a0b55651 | POINT (-73.74218 40.59545) | False | False | False | False | False | False | False | False | False | ... | False | False | False | False | False | False | False | False | False | False |
85743 rows × 2117 columns
Full hierarchy of the places dataset is derived from the official taxonomy available here.
You can limit it to get less columns, with grouped categories.
convert_geometry_to_wide_form_geodataframe("places", "place", amsterdam, hierarchy_depth=1)
Finished operation in 0:00:07
| geometry | places|place|accommodation | places|place|active_life | places|place|arts_and_entertainment | places|place|attractions_and_activities | places|place|automotive | places|place|beauty_and_spa | places|place|business_to_business | places|place|eat_and_drink | places|place|education | ... | places|place|mass_media | places|place|pets | places|place|private_establishments_and_corporates | places|place|professional_services | places|place|public_service_and_government | places|place|real_estate | places|place|religious_organization | places|place|retail | places|place|structure_and_geography | places|place|travel | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| id | |||||||||||||||||||||
| d1d9bdd1-c030-40b1-9ae9-84935d0fd1fb | POINT (-74.25304 40.48667) | False | False | False | True | False | False | False | False | False | ... | False | False | False | False | False | False | False | False | False | False |
| 3a528566-cbfb-410f-ac23-07cf529fdf43 | POINT (-74.23893 40.49926) | False | False | False | False | False | False | False | True | False | ... | False | False | False | False | False | False | False | False | False | False |
| 7290c346-53f6-494c-bd47-536260cd0356 | POINT (-74.24486 40.49938) | False | True | False | True | False | False | False | False | False | ... | False | False | False | False | False | False | False | False | False | False |
| 5023af81-eac2-415c-bea3-17dd86566f4e | POINT (-74.25182 40.499) | False | False | False | True | False | False | False | False | False | ... | False | False | False | False | False | False | False | False | False | False |
| dd5c25f1-e8d5-4379-a0d2-3eb116ac82e4 | POINT (-74.25322 40.5032) | False | False | False | True | False | False | False | False | False | ... | False | False | False | False | False | False | False | False | False | False |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 86f5dc3e-b1e4-44d7-9f1b-5fd07bd16aba | POINT (-73.76206 40.59256) | False | False | False | True | False | False | False | False | False | ... | False | False | False | False | False | False | False | True | False | False |
| e0b67409-7469-477d-9485-caae3b4b7a55 | POINT (-73.75876 40.59279) | False | False | False | True | False | False | False | False | False | ... | False | False | False | False | False | True | False | False | False | False |
| c1fe438b-20d5-45a6-bbaa-7ff00f9e75eb | POINT (-73.75774 40.59271) | False | False | False | False | False | False | False | False | False | ... | False | False | False | True | False | False | False | False | False | False |
| bf1c92a6-5a6d-4fe5-8ba0-d913eac86edb | POINT (-73.75544 40.59341) | False | False | False | False | False | False | False | False | False | ... | False | False | False | True | False | False | False | False | False | False |
| dde201d3-ed19-4d3d-af46-0e6a7098d17e | POINT (-73.75333 40.59327) | False | False | False | False | False | False | False | False | False | ... | False | False | False | False | True | False | False | False | False | False |
186017 rows × 23 columns
Pruning final list of columns¶
By default, OvertureMaestro includes all possible columns regardless of whether any features of a given category exist.
This is done to keep the overall schema consistent for different geographical regions and simplifying the feature engineering process.
However, there is a dedicated parameter include_all_possible_columns that can be set to False to keep only columns based on actually existing features.
convert_geometry_to_wide_form_geodataframe(
"base", "infrastructure", amsterdam, include_all_possible_columns=True # default value
)
Finished operation in 0:00:06
| geometry | base|infrastructure|aerialway|aerialway_station | base|infrastructure|aerialway|cable_car | base|infrastructure|aerialway|chair_lift | base|infrastructure|aerialway|drag_lift | base|infrastructure|aerialway|gondola | base|infrastructure|aerialway|goods | base|infrastructure|aerialway|j-bar | base|infrastructure|aerialway|magic_carpet | base|infrastructure|aerialway|mixed_lift | ... | base|infrastructure|utility|utility_pole | base|infrastructure|utility|water_tower | base|infrastructure|waste_management|recycling | base|infrastructure|waste_management|waste_basket | base|infrastructure|waste_management|waste_disposal | base|infrastructure|water|breakwater | base|infrastructure|water|dam | base|infrastructure|water|drinking_water | base|infrastructure|water|fountain | base|infrastructure|water|weir | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| id | |||||||||||||||||||||
| 25627f2a-664d-35ca-a7c5-90c5fea051e7 | POINT (-74.25192 40.50054) | False | False | False | False | False | False | False | False | False | ... | False | False | False | True | False | False | False | False | False | False |
| 957f3e16-47bd-3a44-87cb-fd90531b2e2a | LINESTRING (-74.25409 40.5025, -74.25407 40.50... | False | False | False | False | False | False | False | False | False | ... | False | False | False | False | False | False | False | False | False | False |
| 8274f299-be66-3bf2-aea0-ad4069582a12 | LINESTRING (-74.25379 40.50246, -74.25378 40.5... | False | False | False | False | False | False | False | False | False | ... | False | False | False | False | False | False | False | False | False | False |
| aa1a50f1-2a37-34f0-bc72-62b32cf116f5 | LINESTRING (-74.25388 40.50239, -74.25386 40.5... | False | False | False | False | False | False | False | False | False | ... | False | False | False | False | False | False | False | False | False | False |
| 517cf5aa-6097-3195-9a0c-5bdaa3fafc5d | LINESTRING (-74.2538 40.50239, -74.25382 40.50... | False | False | False | False | False | False | False | False | False | ... | False | False | False | False | False | False | False | False | False | False |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| c2f6482b-2976-3841-98bc-30953ba98092 | POLYGON ((-73.93049 40.55266, -73.93047 40.552... | False | False | False | False | False | False | False | False | False | ... | False | False | False | False | False | False | False | False | False | False |
| 3bb1a4c5-8e49-357b-9831-42694aecc364 | POLYGON ((-73.93052 40.55265, -73.9305 40.5526... | False | False | False | False | False | False | False | False | False | ... | False | False | False | False | False | False | False | False | False | False |
| a3794d61-4b03-353e-89a6-e84ca36de853 | POLYGON ((-73.93043 40.55267, -73.93041 40.552... | False | False | False | False | False | False | False | False | False | ... | False | False | False | False | False | False | False | False | False | False |
| 7d7fc443-8f5f-3efc-9d96-f35607c87c78 | POLYGON ((-73.9304 40.55268, -73.93038 40.5526... | False | False | False | False | False | False | False | False | False | ... | False | False | False | False | False | False | False | False | False | False |
| 089a3ed4-6829-3838-9b94-18583e45542a | POLYGON ((-73.93037 40.55269, -73.93035 40.552... | False | False | False | False | False | False | False | False | False | ... | False | False | False | False | False | False | False | False | False | False |
504296 rows × 161 columns
convert_geometry_to_wide_form_geodataframe(
"base", "infrastructure", amsterdam, include_all_possible_columns=False
)
Finished operation in 0:00:06
| geometry | base|infrastructure|aerialway|aerialway_station | base|infrastructure|aerialway|cable_car | base|infrastructure|aerialway|pylon | base|infrastructure|airport|airport_gate | base|infrastructure|airport|apron | base|infrastructure|airport|helipad | base|infrastructure|airport|heliport | base|infrastructure|airport|international_airport | base|infrastructure|airport|runway | ... | base|infrastructure|utility|utility_pole | base|infrastructure|utility|water_tower | base|infrastructure|waste_management|recycling | base|infrastructure|waste_management|waste_basket | base|infrastructure|waste_management|waste_disposal | base|infrastructure|water|breakwater | base|infrastructure|water|dam | base|infrastructure|water|drinking_water | base|infrastructure|water|fountain | base|infrastructure|water|weir | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| id | |||||||||||||||||||||
| 25627f2a-664d-35ca-a7c5-90c5fea051e7 | POINT (-74.25192 40.50054) | False | False | False | False | False | False | False | False | False | ... | False | False | False | True | False | False | False | False | False | False |
| 957f3e16-47bd-3a44-87cb-fd90531b2e2a | LINESTRING (-74.25409 40.5025, -74.25407 40.50... | False | False | False | False | False | False | False | False | False | ... | False | False | False | False | False | False | False | False | False | False |
| 8274f299-be66-3bf2-aea0-ad4069582a12 | LINESTRING (-74.25379 40.50246, -74.25378 40.5... | False | False | False | False | False | False | False | False | False | ... | False | False | False | False | False | False | False | False | False | False |
| aa1a50f1-2a37-34f0-bc72-62b32cf116f5 | LINESTRING (-74.25388 40.50239, -74.25386 40.5... | False | False | False | False | False | False | False | False | False | ... | False | False | False | False | False | False | False | False | False | False |
| 517cf5aa-6097-3195-9a0c-5bdaa3fafc5d | LINESTRING (-74.2538 40.50239, -74.25382 40.50... | False | False | False | False | False | False | False | False | False | ... | False | False | False | False | False | False | False | False | False | False |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| c2f6482b-2976-3841-98bc-30953ba98092 | POLYGON ((-73.93049 40.55266, -73.93047 40.552... | False | False | False | False | False | False | False | False | False | ... | False | False | False | False | False | False | False | False | False | False |
| 3bb1a4c5-8e49-357b-9831-42694aecc364 | POLYGON ((-73.93052 40.55265, -73.9305 40.5526... | False | False | False | False | False | False | False | False | False | ... | False | False | False | False | False | False | False | False | False | False |
| a3794d61-4b03-353e-89a6-e84ca36de853 | POLYGON ((-73.93043 40.55267, -73.93041 40.552... | False | False | False | False | False | False | False | False | False | ... | False | False | False | False | False | False | False | False | False | False |
| 7d7fc443-8f5f-3efc-9d96-f35607c87c78 | POLYGON ((-73.9304 40.55268, -73.93038 40.5526... | False | False | False | False | False | False | False | False | False | ... | False | False | False | False | False | False | False | False | False | False |
| 089a3ed4-6829-3838-9b94-18583e45542a | POLYGON ((-73.93037 40.55269, -73.93035 40.552... | False | False | False | False | False | False | False | False | False | ... | False | False | False | False | False | False | False | False | False | False |
504296 rows × 118 columns
Getting a full list of possible column names¶
You can also preview the final list of columns before downloading the data using get_all_possible_column_names function.
You can specify the release, theme and type, as well as hierarchy_depth.
from overturemaestro.advanced_functions.wide_form import get_all_possible_column_names
get_all_possible_column_names(theme="base", type="water")
['base|water|canal|canal', 'base|water|canal|ditch', 'base|water|canal|drain', 'base|water|canal|moat', 'base|water|human_made|fish_pass', 'base|water|human_made|reflecting_pool', 'base|water|human_made|salt_pond', 'base|water|human_made|swimming_pool', 'base|water|lake|lagoon', 'base|water|lake|lake', 'base|water|lake|oxbow', 'base|water|ocean|ocean', 'base|water|physical|bay', 'base|water|physical|cape', 'base|water|physical|ocean', 'base|water|physical|sea', 'base|water|physical|shoal', 'base|water|physical|strait', 'base|water|physical|waterfall', 'base|water|pond|fishpond', 'base|water|pond|pond', 'base|water|reservoir|basin', 'base|water|reservoir|reservoir', 'base|water|reservoir|water_storage', 'base|water|river|river', 'base|water|spring|blowhole', 'base|water|spring|geyser', 'base|water|spring|hot_spring', 'base|water|spring|spring', 'base|water|stream|stream', 'base|water|wastewater|sewage', 'base|water|water|dock', 'base|water|water|fairway', 'base|water|water|tidal_channel', 'base|water|water|wastewater', 'base|water|water|water']
With all parameters empty, function will return a full list of all possible columns with maximal depth.
columns = get_all_possible_column_names()
len(columns)
2640
columns[:10]
['base|infrastructure|aerialway|aerialway_station', 'base|infrastructure|aerialway|cable_car', 'base|infrastructure|aerialway|chair_lift', 'base|infrastructure|aerialway|drag_lift', 'base|infrastructure|aerialway|gondola', 'base|infrastructure|aerialway|goods', 'base|infrastructure|aerialway|j-bar', 'base|infrastructure|aerialway|magic_carpet', 'base|infrastructure|aerialway|mixed_lift', 'base|infrastructure|aerialway|platter']
You can also specify different hierarchy_depth values.
get_all_possible_column_names(theme="buildings", type="building", hierarchy_depth=1)
['buildings|building', 'buildings|building|agricultural', 'buildings|building|civic', 'buildings|building|commercial', 'buildings|building|education', 'buildings|building|entertainment', 'buildings|building|industrial', 'buildings|building|medical', 'buildings|building|military', 'buildings|building|outbuilding', 'buildings|building|religious', 'buildings|building|residential', 'buildings|building|service', 'buildings|building|transportation']