Skip to content

OSMPbfLoader

srai.loaders.OSMPbfLoader(
    pbf_file=None,
    download_source="any",
    download_directory="files",
    verbosity_mode="transient",
)

Bases: OSMLoader

OSMPbfLoader.

OSM(OpenStreetMap)[1] PBF(Protocolbuffer Binary Format)[2] loader is a loader capable of loading OSM features from a PBF file. It filters features based on OSM tags[3] in form of key:value pairs, that are used by OSM users to give meaning to geometries.

This loader uses PbfFileReader from the QuackOSM[3] library. It utilizes the duckdb[4] engine with spatial[5] extension capable of parsing an *.osm.pbf file.

Additionally, it can download a pbf file extract for a given area using different sources.

References
  1. https://www.openstreetmap.org/
  2. https://wiki.openstreetmap.org/wiki/PBF_Format
  3. https://github.com/kraina-ai/quackosm
  4. https://duckdb.org/
  5. https://github.com/duckdb/duckdb_spatial
PARAMETER DESCRIPTION
pbf_file

Downloaded *.osm.pbf file to be used by the loader. If not provided, it will be automatically downloaded for a given area. Defaults to None.

TYPE: Union[str, Path] DEFAULT: None

download_source

Source to use when downloading PBF files. Can be one of: any, geofabrik, osmfr, bbbike. Defaults to "any".

TYPE: OsmExtractSource DEFAULT: 'any'

download_directory

Directory where to save the downloaded *.osm.pbf files. Ignored if pbf_file is provided. Defaults to "files".

TYPE: Union[str, Path] DEFAULT: 'files'

verbosity_mode

Set progress verbosity mode. Can be one of: silent, transient and verbose. Silent disables output completely. Transient tracks progress, but removes output after finished. Verbose leaves all progress outputs in the stdout. Defaults to "transient".

TYPE: Literal['silent', 'transient', 'verbose'] DEFAULT: 'transient'

Source code in srai/loaders/osm_loaders/osm_pbf_loader.py
def __init__(
    self,
    pbf_file: Optional[Union[str, Path]] = None,
    download_source: "OsmExtractSource" = "any",
    download_directory: Union[str, Path] = "files",
    verbosity_mode: Literal["silent", "transient", "verbose"] = "transient",
) -> None:
    """
    Initialize OSMPbfLoader.

    Args:
        pbf_file (Union[str, Path], optional): Downloaded `*.osm.pbf` file to be used by
            the loader. If not provided, it will be automatically downloaded for a given area.
            Defaults to None.
        download_source (OsmExtractSource, optional): Source to use when downloading PBF files.
            Can be one of: `any`, `geofabrik`, `osmfr`, `bbbike`.
            Defaults to "any".
        download_directory (Union[str, Path], optional): Directory where to save the downloaded
            `*.osm.pbf` files. Ignored if `pbf_file` is provided. Defaults to "files".
        verbosity_mode (Literal["silent", "transient", "verbose"], optional): Set progress
            verbosity mode. Can be one of: silent, transient and verbose. Silent disables
            output completely. Transient tracks progress, but removes output after finished.
            Verbose leaves all progress outputs in the stdout. Defaults to "transient".
    """
    import_optional_dependencies(dependency_group="osm", modules=["quackosm"])
    self.pbf_file = pbf_file
    self.download_source = download_source
    self.download_directory = download_directory
    self.verbosity_mode = verbosity_mode

load(
    area,
    tags,
    ignore_cache=False,
    explode_tags=True,
    keep_all_tags=False,
)

Load OSM features with specified tags for a given area from an *.osm.pbf file.

The loader will use provided *.osm.pbf file, or download extracts automatically. Later it will parse and filter features from files using PbfFileReader from QuackOSM library. It will return a GeoDataFrame containing the geometry column and columns for tag keys.

Some key/value pairs might be missing from the resulting GeoDataFrame,

simply because there are no such objects in the given area.

PARAMETER DESCRIPTION
area

Area for which to download objects.

TYPE: Union[BaseGeometry, Iterable[BaseGeometry], GeoSeries, GeoDataFrame]

tags

A dictionary specifying which tags to download. The keys should be OSM tags (e.g. building, amenity). The values should either be True for retrieving all objects with the tag, string for retrieving a single tag-value pair or list of strings for retrieving all values specified in the list. tags={'leisure': 'park} would return parks from the area. tags={'leisure': 'park, 'amenity': True, 'shop': ['bakery', 'bicycle']} would return parks, all amenity types, bakeries and bicycle shops.

TYPE: Union[OsmTagsFilter, GroupedOsmTagsFilter]

ignore_cache

(bool, optional): Whether to ignore precalculated geoparquet files or not. Defaults to False.

TYPE: bool DEFAULT: False

explode_tags

(bool, optional): Whether to split OSM tags into multiple columns or keep them in a single dict. Defaults to True.

TYPE: bool DEFAULT: True

keep_all_tags

(bool, optional): Whether to keep all tags related to the element, or return only those defined in the tags_filter. When True, will override the optional grouping defined in the tags_filter. Defaults to False.

TYPE: bool DEFAULT: False

RAISES DESCRIPTION
ValueError

If PBF file is expected to be downloaded and provided geometries aren't shapely.geometry.Polygons.

RETURNS DESCRIPTION
GeoDataFrame

gpd.GeoDataFrame: Downloaded features as a GeoDataFrame.

Source code in srai/loaders/osm_loaders/osm_pbf_loader.py
def load(
    self,
    area: Union[BaseGeometry, Iterable[BaseGeometry], gpd.GeoSeries, gpd.GeoDataFrame],
    tags: Union[OsmTagsFilter, GroupedOsmTagsFilter],
    ignore_cache: bool = False,
    explode_tags: bool = True,
    keep_all_tags: bool = False,
) -> gpd.GeoDataFrame:
    """
    Load OSM features with specified tags for a given area from an `*.osm.pbf` file.

    The loader will use provided `*.osm.pbf` file, or download extracts
    automatically. Later it will parse and filter features from files
    using `PbfFileReader` from `QuackOSM` library. It will return a GeoDataFrame
    containing the `geometry` column and columns for tag keys.

    Note: Some key/value pairs might be missing from the resulting GeoDataFrame,
        simply because there are no such objects in the given area.

    Args:
        area (Union[BaseGeometry, Iterable[BaseGeometry], gpd.GeoSeries, gpd.GeoDataFrame]):
            Area for which to download objects.
        tags (Union[OsmTagsFilter, GroupedOsmTagsFilter]): A dictionary
            specifying which tags to download.
            The keys should be OSM tags (e.g. `building`, `amenity`).
            The values should either be `True` for retrieving all objects with the tag,
            string for retrieving a single tag-value pair
            or list of strings for retrieving all values specified in the list.
            `tags={'leisure': 'park}` would return parks from the area.
            `tags={'leisure': 'park, 'amenity': True, 'shop': ['bakery', 'bicycle']}`
            would return parks, all amenity types, bakeries and bicycle shops.
        ignore_cache: (bool, optional): Whether to ignore precalculated geoparquet files or not.
            Defaults to False.
        explode_tags: (bool, optional): Whether to split OSM tags into multiple columns or keep
            them in a single dict. Defaults to True.
        keep_all_tags: (bool, optional): Whether to keep all tags related to the element,
            or return only those defined in the `tags_filter`. When True, will override
            the optional grouping defined in the `tags_filter`. Defaults to False.

    Raises:
        ValueError: If PBF file is expected to be downloaded and provided geometries
            aren't shapely.geometry.Polygons.

    Returns:
        gpd.GeoDataFrame: Downloaded features as a GeoDataFrame.
    """
    area_wgs84 = self._prepare_area_gdf(area)

    pbf_reader = self._get_pbf_file_reader(area_wgs84, tags)

    if self.pbf_file is not None:
        features_gdf = pbf_reader.convert_pbf_to_geodataframe(
            file_paths=self.pbf_file,
            keep_all_tags=keep_all_tags,
            explode_tags=explode_tags,
            ignore_cache=ignore_cache,
        )
    else:
        features_gdf = pbf_reader.convert_geometry_to_geodataframe(
            keep_all_tags=keep_all_tags, explode_tags=explode_tags, ignore_cache=ignore_cache
        )

    features_gdf = features_gdf.set_crs(WGS84_CRS)

    features_columns = [
        column
        for column in features_gdf.columns
        if column != GEOMETRY_COLUMN and features_gdf[column].notnull().any()
    ]
    features_gdf = features_gdf[[GEOMETRY_COLUMN, *sorted(features_columns)]]

    return features_gdf

load_to_geoparquet(
    area,
    tags,
    ignore_cache=False,
    explode_tags=True,
    keep_all_tags=False,
)

Load OSM features with specified tags for a given area and save it to geoparquet file.

PARAMETER DESCRIPTION
area

Area for which to download objects.

TYPE: Union[BaseGeometry, Iterable[BaseGeometry], GeoSeries, GeoDataFrame]

tags

A dictionary specifying which tags to download. The keys should be OSM tags (e.g. building, amenity). The values should either be True for retrieving all objects with the tag, string for retrieving a single tag-value pair or list of strings for retrieving all values specified in the list. tags={'leisure': 'park} would return parks from the area. tags={'leisure': 'park, 'amenity': True, 'shop': ['bakery', 'bicycle']} would return parks, all amenity types, bakeries and bicycle shops.

TYPE: Union[OsmTagsFilter, GroupedOsmTagsFilter]

ignore_cache

(bool, optional): Whether to ignore precalculated geoparquet files or not. Defaults to False.

TYPE: bool DEFAULT: False

explode_tags

(bool, optional): Whether to split OSM tags into multiple columns or keep them in a single dict. Defaults to True.

TYPE: bool DEFAULT: True

keep_all_tags

(bool, optional): Whether to keep all tags related to the element, or return only those defined in the tags_filter. When True, will override the optional grouping defined in the tags_filter. Defaults to False.

TYPE: bool DEFAULT: False

RETURNS DESCRIPTION
Path

Path to the saved GeoParquet file.

TYPE: Path

Source code in srai/loaders/osm_loaders/osm_pbf_loader.py
def load_to_geoparquet(
    self,
    area: Union[BaseGeometry, Iterable[BaseGeometry], gpd.GeoSeries, gpd.GeoDataFrame],
    tags: Union[OsmTagsFilter, GroupedOsmTagsFilter],
    ignore_cache: bool = False,
    explode_tags: bool = True,
    keep_all_tags: bool = False,
) -> Path:
    """
    Load OSM features with specified tags for a given area and save it to geoparquet file.

    Args:
        area (Union[BaseGeometry, Iterable[BaseGeometry], gpd.GeoSeries, gpd.GeoDataFrame]):
            Area for which to download objects.
        tags (Union[OsmTagsFilter, GroupedOsmTagsFilter]): A dictionary
            specifying which tags to download.
            The keys should be OSM tags (e.g. `building`, `amenity`).
            The values should either be `True` for retrieving all objects with the tag,
            string for retrieving a single tag-value pair
            or list of strings for retrieving all values specified in the list.
            `tags={'leisure': 'park}` would return parks from the area.
            `tags={'leisure': 'park, 'amenity': True, 'shop': ['bakery', 'bicycle']}`
            would return parks, all amenity types, bakeries and bicycle shops.
        ignore_cache: (bool, optional): Whether to ignore precalculated geoparquet files or not.
            Defaults to False.
        explode_tags: (bool, optional): Whether to split OSM tags into multiple columns or keep
            them in a single dict. Defaults to True.
        keep_all_tags: (bool, optional): Whether to keep all tags related to the element,
            or return only those defined in the `tags_filter`. When True, will override
            the optional grouping defined in the `tags_filter`. Defaults to False.

    Returns:
        Path: Path to the saved GeoParquet file.
    """
    area_wgs84 = self._prepare_area_gdf(area)

    pbf_reader = self._get_pbf_file_reader(area_wgs84, tags)

    geoparquet_file_path: Path

    if self.pbf_file is not None:
        geoparquet_file_path = pbf_reader.convert_pbf_to_parquet(
            pbf_path=self.pbf_file,
            keep_all_tags=keep_all_tags,
            explode_tags=explode_tags,
            ignore_cache=ignore_cache,
        )
    else:
        geoparquet_file_path = pbf_reader.convert_geometry_to_parquet(
            keep_all_tags=keep_all_tags, explode_tags=explode_tags, ignore_cache=ignore_cache
        )

    return geoparquet_file_path