OSM PBF Loader¶
OSMPbfLoader
can really quickly parse full OSM extract in the form of *.osm.pbf
file.
It can download and parse a lot of features much faster than the OSMOnlineLoader
, but it's much more useful when a lot of different features are required at once (like when using predefined filters).
When only a single or few features are needed, OSMOnlineLoader
might be a better choice, since OSMPbfLoader
will use a full extract of all features in a given region and will have to iterate over all of them.
In [1]:
Copied!
from srai.loaders.osm_loaders.filters import HEX2VEC_FILTER, GEOFABRIK_LAYERS
from srai.loaders.osm_loaders.filters.popular import get_popular_tags
from srai.loaders.osm_loaders import OSMPbfLoader
from srai.constants import REGIONS_INDEX, WGS84_CRS
from srai.regionalizers import geocode_to_region_gdf
from srai.geometry import buffer_geometry
from shapely.geometry import Point, box
import geopandas as gpd
from srai.loaders.osm_loaders.filters import HEX2VEC_FILTER, GEOFABRIK_LAYERS
from srai.loaders.osm_loaders.filters.popular import get_popular_tags
from srai.loaders.osm_loaders import OSMPbfLoader
from srai.constants import REGIONS_INDEX, WGS84_CRS
from srai.regionalizers import geocode_to_region_gdf
from srai.geometry import buffer_geometry
from shapely.geometry import Point, box
import geopandas as gpd
Using OSMPbfLoader to download data for a specific area¶
Download all features from HEX2VEC_FILTER
in Warsaw, Poland¶
In [2]:
Copied!
loader = OSMPbfLoader()
warsaw_gdf = geocode_to_region_gdf("Warsaw, Poland")
warsaw_features_gdf = loader.load(warsaw_gdf, HEX2VEC_FILTER)
warsaw_features_gdf
loader = OSMPbfLoader()
warsaw_gdf = geocode_to_region_gdf("Warsaw, Poland")
warsaw_features_gdf = loader.load(warsaw_gdf, HEX2VEC_FILTER)
warsaw_features_gdf
[Warsaw, Masovian Voivodeship, Poland] Downloading pbf file #1 (Elements): 100%|██████████| 9063336/9063336 [00:12<00:00, 698439.68it/s] cb86c293310b63a79b078e59ffd1ffa1828d10b2a586df0fee66bbb6af2ec1dc.osm.pbf: 100%|██████████| 41.8M/41.8M [00:08<00:00, 5.18MiB/s] [Warsaw, Masovian Voivodeship, Poland] Counting pbf features: 5247986it [00:22, 236262.46it/s] [Warsaw, Masovian Voivodeship, Poland] Parsing pbf file #1: 98%|█████████▊| 5140154/5247986 [02:05<00:03, 27120.48it/s]/opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/site-packages/srai/loaders/osm_loaders/pbf_file_handler.py:222: RuntimeWarning: invalid area (area_id=29859113) geometry = self._get_osm_geometry(osm_object, parse_to_wkb_function) [Warsaw, Masovian Voivodeship, Poland] Parsing pbf file #1: 100%|█████████▉| 5243226/5247986 [02:09<00:00, 27557.59it/s]/opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/site-packages/srai/loaders/osm_loaders/pbf_file_handler.py:222: RuntimeWarning: invalid area (area_id=32604155) geometry = self._get_osm_geometry(osm_object, parse_to_wkb_function) [Warsaw, Masovian Voivodeship, Poland] Parsing pbf file #1: 100%|██████████| 5247986/5247986 [02:11<00:00, 39940.36it/s]
Out[2]:
geometry | aeroway | amenity | building | healthcare | historic | landuse | leisure | military | natural | office | shop | sport | tourism | water | waterway | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
feature_id | ||||||||||||||||
node/31005854 | POINT (20.94595 52.17691) | NaN | restaurant | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
node/31156693 | POINT (20.95489 52.27100) | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | bakery | NaN | NaN | NaN | NaN |
node/31917380 | POINT (21.01451 52.21653) | NaN | NaN | NaN | NaN | memorial | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
node/32599714 | POINT (21.01518 52.21904) | NaN | parking_entrance | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
node/33238753 | POINT (20.92711 52.33046) | NaN | ferry_terminal | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
relation/5669033 | MULTIPOLYGON (((21.18079 52.20211, 21.18260 52... | NaN | NaN | NaN | NaN | NaN | forest | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
relation/16128958 | MULTIPOLYGON (((21.01609 52.23917, 21.01614 52... | NaN | university | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
relation/16452497 | MULTIPOLYGON (((20.95206 52.26387, 20.95231 52... | NaN | NaN | ruins | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
relation/12533540 | MULTIPOLYGON (((20.95126 52.21739, 20.95203 52... | NaN | NaN | NaN | NaN | NaN | construction | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
relation/16500414 | MULTIPOLYGON (((20.89688 52.23994, 20.89701 52... | NaN | NaN | NaN | NaN | NaN | construction | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
302804 rows × 16 columns
Plot features¶
Inspired by prettymaps
In [3]:
Copied!
clipped_features_gdf = warsaw_features_gdf.clip(warsaw_gdf.geometry.unary_union)
clipped_features_gdf = warsaw_features_gdf.clip(warsaw_gdf.geometry.unary_union)
In [4]:
Copied!
ax = warsaw_gdf.plot(color="lavender", figsize=(16, 16))
# plot water
clipped_features_gdf.dropna(subset=["water", "waterway"], how="all").plot(
ax=ax, color="deepskyblue"
)
# plot greenery
clipped_features_gdf[
clipped_features_gdf["landuse"].isin(
["grass", "orchard", "flowerbed", "forest", "greenfield", "meadow"]
)
].plot(ax=ax, color="mediumseagreen")
# plot buildings
clipped_features_gdf.dropna(subset=["building"], how="all").plot(
ax=ax, color="dimgray", markersize=0.1
)
xmin, ymin, xmax, ymax = warsaw_gdf.total_bounds
ax.set_xlim(xmin, xmax)
ax.set_ylim(ymin, ymax)
ax.set_axis_off()
ax = warsaw_gdf.plot(color="lavender", figsize=(16, 16))
# plot water
clipped_features_gdf.dropna(subset=["water", "waterway"], how="all").plot(
ax=ax, color="deepskyblue"
)
# plot greenery
clipped_features_gdf[
clipped_features_gdf["landuse"].isin(
["grass", "orchard", "flowerbed", "forest", "greenfield", "meadow"]
)
].plot(ax=ax, color="mediumseagreen")
# plot buildings
clipped_features_gdf.dropna(subset=["building"], how="all").plot(
ax=ax, color="dimgray", markersize=0.1
)
xmin, ymin, xmax, ymax = warsaw_gdf.total_bounds
ax.set_xlim(xmin, xmax)
ax.set_ylim(ymin, ymax)
ax.set_axis_off()
Download all features from popular tags based on OSMTagInfo in Vienna, Austria¶
In [5]:
Copied!
popular_tags = get_popular_tags(in_wiki_only=True)
num_keys = len(popular_tags)
f"Unique keys: {num_keys}."
popular_tags = get_popular_tags(in_wiki_only=True)
num_keys = len(popular_tags)
f"Unique keys: {num_keys}."
Out[5]:
'Unique keys: 317.'
In [6]:
Copied!
{k: popular_tags[k] for k in list(popular_tags)[:10]}
{k: popular_tags[k] for k in list(popular_tags)[:10]}
Out[6]:
{'4wd_only': ['yes'], 'LandPro08:reviewed': ['no'], 'abandoned': ['yes'], 'abandoned:railway': ['rail'], 'abutters': ['residential'], 'access': ['agricultural', 'customers', 'delivery', 'designated', 'destination', 'forestry', 'no', 'permissive', 'permit', 'private', 'unknown', 'yes'], 'addr:TW:dataset': ['137998'], 'addr:country': ['CZ'], 'addr:state': ['AZ', 'CA', 'CT', 'FL', 'KY', 'MD', 'ME', 'NY', 'TX'], 'admin_level': ['10', '11', '2', '4', '5', '6', '7', '8', '9']}
In [7]:
Copied!
vienna_center_circle = buffer_geometry(Point(16.37009, 48.20931), meters=1000)
vienna_center_circle_gdf = gpd.GeoDataFrame(
geometry=[vienna_center_circle],
crs=WGS84_CRS,
index=gpd.pd.Index(data=["Vienna"], name=REGIONS_INDEX),
)
vienna_center_circle = buffer_geometry(Point(16.37009, 48.20931), meters=1000)
vienna_center_circle_gdf = gpd.GeoDataFrame(
geometry=[vienna_center_circle],
crs=WGS84_CRS,
index=gpd.pd.Index(data=["Vienna"], name=REGIONS_INDEX),
)
In [8]:
Copied!
loader = OSMPbfLoader()
vienna_features_gdf = loader.load(vienna_center_circle_gdf, popular_tags)
vienna_features_gdf
loader = OSMPbfLoader()
vienna_features_gdf = loader.load(vienna_center_circle_gdf, popular_tags)
vienna_features_gdf
[Vienna] Downloading pbf file #1 (Elements): 100%|██████████| 191725/191725 [00:01<00:00, 136102.73it/s] b3bcf5d17ff3e9d002dc3bbaef9dc5b3979a45e7dedfbc63c8281af2bdc5827d.osm.pbf: 100%|██████████| 1.55M/1.55M [00:00<00:00, 11.9MiB/s] [Vienna] Counting pbf features: 109482it [00:00, 219447.23it/s] [Vienna] Parsing pbf file #1: 100%|██████████| 109482/109482 [00:31<00:00, 3480.40it/s]
Out[8]:
geometry | access | admin_level | advertising | amenity | area | area:highway | artwork_type | atm | barrier | ... | tram | tunnel | type | vehicle | vending | waste | water | water_source | waterway | wheelchair | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
feature_id | |||||||||||||||||||||
node/199732 | POINT (16.36007 48.20843) | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
node/199748 | POINT (16.35783 48.21256) | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
node/199753 | POINT (16.35756 48.21165) | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
node/392790 | POINT (16.36620 48.21654) | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
node/395420 | POINT (16.36053 48.20634) | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
relation/6149840 | MULTIPOLYGON (((16.36844 48.20032, 16.36845 48... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | yes | NaN | NaN | NaN | NaN | NaN | NaN | NaN | yes |
relation/14972437 | MULTIPOLYGON (((16.36625 48.20066, 16.36632 48... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | yes | NaN | NaN | NaN | NaN | NaN | NaN | NaN | yes |
relation/16221369 | MULTIPOLYGON (((16.36491 48.21639, 16.36493 48... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
relation/16221368 | MULTIPOLYGON (((16.36469 48.21660, 16.36480 48... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
relation/16241555 | MULTIPOLYGON (((16.35858 48.20522, 16.35866 48... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | yes | NaN | NaN | NaN | NaN | NaN | NaN | NaN | yes |
21211 rows × 167 columns
Plot features¶
Uses default
preset colours from prettymaps
In [9]:
Copied!
clipped_vienna_features_gdf = vienna_features_gdf.clip(vienna_center_circle)
clipped_vienna_features_gdf = vienna_features_gdf.clip(vienna_center_circle)
In [10]:
Copied!
ax = vienna_center_circle_gdf.plot(color="#F2F4CB", figsize=(16, 16))
# plot water
clipped_vienna_features_gdf.dropna(subset=["water", "waterway"], how="all").plot(
ax=ax, color="#a8e1e6"
)
# plot streets
clipped_vienna_features_gdf.dropna(subset=["highway"], how="all").plot(
ax=ax, color="#475657", markersize=0.1
)
# plot buildings
clipped_vienna_features_gdf.dropna(subset=["building"], how="all").plot(ax=ax, color="#FF5E5B")
# plot parkings
clipped_vienna_features_gdf[
(clipped_vienna_features_gdf["amenity"] == "parking")
| (clipped_vienna_features_gdf["highway"] == "pedestrian")
].plot(ax=ax, color="#2F3737", markersize=0.1)
# plot greenery
clipped_vienna_features_gdf[
clipped_vienna_features_gdf["landuse"].isin(
["grass", "orchard", "flowerbed", "forest", "greenfield", "meadow"]
)
].plot(ax=ax, color="#8BB174")
xmin, ymin, xmax, ymax = vienna_center_circle_gdf.total_bounds
ax.set_xlim(xmin, xmax)
ax.set_ylim(ymin, ymax)
ax.set_axis_off()
ax = vienna_center_circle_gdf.plot(color="#F2F4CB", figsize=(16, 16))
# plot water
clipped_vienna_features_gdf.dropna(subset=["water", "waterway"], how="all").plot(
ax=ax, color="#a8e1e6"
)
# plot streets
clipped_vienna_features_gdf.dropna(subset=["highway"], how="all").plot(
ax=ax, color="#475657", markersize=0.1
)
# plot buildings
clipped_vienna_features_gdf.dropna(subset=["building"], how="all").plot(ax=ax, color="#FF5E5B")
# plot parkings
clipped_vienna_features_gdf[
(clipped_vienna_features_gdf["amenity"] == "parking")
| (clipped_vienna_features_gdf["highway"] == "pedestrian")
].plot(ax=ax, color="#2F3737", markersize=0.1)
# plot greenery
clipped_vienna_features_gdf[
clipped_vienna_features_gdf["landuse"].isin(
["grass", "orchard", "flowerbed", "forest", "greenfield", "meadow"]
)
].plot(ax=ax, color="#8BB174")
xmin, ymin, xmax, ymax = vienna_center_circle_gdf.total_bounds
ax.set_xlim(xmin, xmax)
ax.set_ylim(ymin, ymax)
ax.set_axis_off()
Download all grouped features based on Geofabrik layers in New York, USA¶
In [11]:
Copied!
manhattan_bbox = box(-73.994551, 40.762396, -73.936872, 40.804239)
manhattan_bbox_gdf = gpd.GeoDataFrame(
geometry=[manhattan_bbox],
crs=WGS84_CRS,
index=gpd.pd.Index(data=["New York"], name=REGIONS_INDEX),
)
manhattan_bbox = box(-73.994551, 40.762396, -73.936872, 40.804239)
manhattan_bbox_gdf = gpd.GeoDataFrame(
geometry=[manhattan_bbox],
crs=WGS84_CRS,
index=gpd.pd.Index(data=["New York"], name=REGIONS_INDEX),
)
In [12]:
Copied!
loader = OSMPbfLoader()
new_york_features_gdf = loader.load(manhattan_bbox_gdf, GEOFABRIK_LAYERS)
new_york_features_gdf
loader = OSMPbfLoader()
new_york_features_gdf = loader.load(manhattan_bbox_gdf, GEOFABRIK_LAYERS)
new_york_features_gdf
[New York] Downloading pbf file #1 (Elements): 100%|██████████| 545216/545216 [00:01<00:00, 384649.95it/s] b64082dfa7a4ab8b76749246e4110c2d24dec79a8e5832b2a1cb05525f8d56bf.osm.pbf: 100%|██████████| 3.10M/3.10M [00:00<00:00, 3.55MiB/s] [New York] Counting pbf features: 336968it [00:01, 237799.28it/s] [New York] Parsing pbf file #1: 100%|██████████| 336968/336968 [00:12<00:00, 26668.05it/s] Grouping features: 100%|██████████| 28/28 [00:01<00:00, 21.20it/s]
Out[12]:
geometry | public | education | health | leisure | catering | accommodation | shopping | money | tourism | ... | major_roads | minor_roads | highway_links | very_small_roads | paths_unsuitable_for_cars | railways | waterways | buildings | landuse | water | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
feature_id | |||||||||||||||||||||
node/42421728 | POINT (-73.96004 40.79805) | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
node/42421731 | POINT (-73.96147 40.79865) | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
node/42421737 | POINT (-73.96287 40.79924) | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
node/42421741 | POINT (-73.96569 40.80043) | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
node/42421745 | POINT (-73.96800 40.80140) | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
way/1216690941 | LINESTRING (-73.93714 40.80166, -73.93752 40.8... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | highway=footway | NaN | NaN | NaN | NaN | NaN |
way/1217099496 | LINESTRING (-73.94523 40.78096, -73.94541 40.7... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | highway=residential | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
way/1217393802 | LINESTRING (-73.95992 40.78221, -73.95996 40.7... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | highway=secondary | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
relation/2389563 | MULTIPOLYGON (((-73.96780 40.74813, -73.96483 ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | natural=water |
relation/16442030 | MULTIPOLYGON (((-73.95366 40.78967, -73.95351 ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | landuse=industrial | NaN |
44718 rows × 27 columns
Plot features¶
Inspired by https://snazzymaps.com/style/14889/flat-pale
In [13]:
Copied!
ax = manhattan_bbox_gdf.plot(color="#e7e7df", figsize=(16, 16))
# plot greenery
new_york_features_gdf[new_york_features_gdf["leisure"] == "leisure=park"].plot(
ax=ax, color="#bae5ce"
)
# plot water
new_york_features_gdf.dropna(subset=["water", "waterways"], how="all").plot(ax=ax, color="#c7eced")
# plot streets
new_york_features_gdf.dropna(subset=["paths_unsuitable_for_cars"], how="all").plot(
ax=ax, color="#e7e7df", linewidth=1
)
new_york_features_gdf.dropna(
subset=["very_small_roads", "highway_links", "minor_roads"], how="all"
).plot(ax=ax, color="#fff", linewidth=2)
new_york_features_gdf.dropna(subset=["major_roads"], how="all").plot(
ax=ax, color="#fac9a9", linewidth=3
)
# plot buildings
new_york_features_gdf.dropna(subset=["buildings"], how="all").plot(ax=ax, color="#cecebd")
xmin, ymin, xmax, ymax = manhattan_bbox_gdf.total_bounds
ax.set_xlim(xmin, xmax)
ax.set_ylim(ymin, ymax)
ax.set_axis_off()
ax = manhattan_bbox_gdf.plot(color="#e7e7df", figsize=(16, 16))
# plot greenery
new_york_features_gdf[new_york_features_gdf["leisure"] == "leisure=park"].plot(
ax=ax, color="#bae5ce"
)
# plot water
new_york_features_gdf.dropna(subset=["water", "waterways"], how="all").plot(ax=ax, color="#c7eced")
# plot streets
new_york_features_gdf.dropna(subset=["paths_unsuitable_for_cars"], how="all").plot(
ax=ax, color="#e7e7df", linewidth=1
)
new_york_features_gdf.dropna(
subset=["very_small_roads", "highway_links", "minor_roads"], how="all"
).plot(ax=ax, color="#fff", linewidth=2)
new_york_features_gdf.dropna(subset=["major_roads"], how="all").plot(
ax=ax, color="#fac9a9", linewidth=3
)
# plot buildings
new_york_features_gdf.dropna(subset=["buildings"], how="all").plot(ax=ax, color="#cecebd")
xmin, ymin, xmax, ymax = manhattan_bbox_gdf.total_bounds
ax.set_xlim(xmin, xmax)
ax.set_ylim(ymin, ymax)
ax.set_axis_off()