Skip to content

Pbf file reader

PBF File Reader.

This module contains a reader capable of parsing a PBF file into a GeoDataFrame.

PbfFileReader(
    tags_filter=None,
    geometry_filter=None,
    custom_sql_filter=None,
    working_directory="files",
    osm_way_polygon_features_config=None,
    parquet_compression="snappy",
    osm_extract_source=OsmExtractSource.any,
    verbosity_mode="transient",
    geometry_coverage_iou_threshold=0.01,
    allow_uncovered_geometry=False,
    debug_memory=False,
    debug_times=False,
)

PbfFileReader.

PBF(Protocolbuffer Binary Format)[1] file reader is a dedicated *.osm.pbf files reader class based on DuckDB[2] and its spatial extension[3].

Handler can filter out OSM features based on tags filter and geometry filter to limit the result.

References
  1. https://wiki.openstreetmap.org/wiki/PBF_Format
  2. https://duckdb.org/
  3. https://github.com/duckdb/duckdb_spatial
PARAMETER DESCRIPTION
tags_filter

A dictionary specifying which tags to download. The keys should be OSM tags (e.g. building, amenity). The values should either be True for retrieving all objects with the tag, string for retrieving a single tag-value pair or list of strings for retrieving all values specified in the list. tags={'leisure': 'park} would return parks from the area. tags={'leisure': 'park, 'amenity': True, 'shop': ['bakery', 'bicycle']} would return parks, all amenity types, bakeries and bicycle shops. If None, handler will allow all of the tags to be parsed. Defaults to None.

TYPE: Union[OsmTagsFilter, GroupedOsmTagsFilter] DEFAULT: None

geometry_filter

Region which can be used to filter only intersecting OSM objects. Defaults to None.

TYPE: BaseGeometry DEFAULT: None

custom_sql_filter

Allows users to pass custom SQL conditions used to filter OSM features. It will be embedded into predefined queries and requires DuckDB syntax to operate on tags map object. Defaults to None.

TYPE: str DEFAULT: None

working_directory

Directory where to save the parsed *.parquet files. Defaults to "files".

TYPE: Union[str, Path] DEFAULT: 'files'

osm_way_polygon_features_config

Config used to determine which closed way features are polygons. Modifications to this config left are left for experienced OSM users. Defaults to predefined "osm_way_polygon_features.json".

TYPE: Union[OsmWayPolygonConfig, dict[str, Any]] DEFAULT: None

parquet_compression

Compression of intermediate parquet files. Check https://duckdb.org/docs/sql/statements/copy#parquet-options for more info. Defaults to "snappy".

TYPE: str DEFAULT: 'snappy'

osm_extract_source

A source for automatic downloading of OSM extracts. Can be Geofabrik, BBBike, OSMfr or any. Defaults to any.

TYPE: Union[OsmExtractSource, str] DEFAULT: any

verbosity_mode

Set progress verbosity mode. Can be one of: silent, transient and verbose. Silent disables output completely. Transient tracks progress, but removes output after finished. Verbose leaves all progress outputs in the stdout. Defaults to "transient".

TYPE: Literal['silent', 'transient', 'verbose'] DEFAULT: 'transient'

geometry_coverage_iou_threshold

Minimal value of the Intersection over Union metric for selecting the matching OSM extracts. Is best matching extract has value lower than the threshold, it is discarded (except the first one). Has to be in range between 0 and 1. Value of 0 will allow every intersected extract, value of 1 will only allow extracts that match the geometry exactly. Defaults to 0.01.

TYPE: float DEFAULT: 0.01

allow_uncovered_geometry

Suppress an error if some geometry parts aren't covered by any OSM extract. Defaults to False.

TYPE: bool DEFAULT: False

debug_memory

If turned on, will keep all temporary files after operation for debugging. Defaults to False.

TYPE: bool DEFAULT: False

debug_times

If turned on, will report timestamps at which second each step has been executed. Defaults to False.

TYPE: bool DEFAULT: False

RAISES DESCRIPTION
InvalidGeometryFilter

When provided geometry filter has parts without area.

Source code in quackosm/pbf_file_reader.py
def __init__(
    self,
    tags_filter: Optional[Union[OsmTagsFilter, GroupedOsmTagsFilter]] = None,
    geometry_filter: Optional[BaseGeometry] = None,
    custom_sql_filter: Optional[str] = None,
    working_directory: Union[str, Path] = "files",
    osm_way_polygon_features_config: Optional[
        Union[OsmWayPolygonConfig, dict[str, Any]]
    ] = None,
    parquet_compression: str = "snappy",
    osm_extract_source: Union[OsmExtractSource, str] = OsmExtractSource.any,
    verbosity_mode: Literal["silent", "transient", "verbose"] = "transient",
    geometry_coverage_iou_threshold: float = 0.01,
    allow_uncovered_geometry: bool = False,
    debug_memory: bool = False,
    debug_times: bool = False,
) -> None:
    """
    Initialize PbfFileReader.

    Args:
        tags_filter (Union[OsmTagsFilter, GroupedOsmTagsFilter], optional): A dictionary
            specifying which tags to download.
            The keys should be OSM tags (e.g. `building`, `amenity`).
            The values should either be `True` for retrieving all objects with the tag,
            string for retrieving a single tag-value pair
            or list of strings for retrieving all values specified in the list.
            `tags={'leisure': 'park}` would return parks from the area.
            `tags={'leisure': 'park, 'amenity': True, 'shop': ['bakery', 'bicycle']}`
            would return parks, all amenity types, bakeries and bicycle shops.
            If `None`, handler will allow all of the tags to be parsed. Defaults to `None`.
        geometry_filter (BaseGeometry, optional): Region which can be used to filter only
            intersecting OSM objects. Defaults to `None`.
        custom_sql_filter (str, optional): Allows users to pass custom SQL conditions used
            to filter OSM features. It will be embedded into predefined queries and requires
            DuckDB syntax to operate on tags map object. Defaults to `None`.
        working_directory (Union[str, Path], optional): Directory where to save
            the parsed `*.parquet` files. Defaults to "files".
        osm_way_polygon_features_config (Union[OsmWayPolygonConfig, dict[str, Any]], optional):
            Config used to determine which closed way features are polygons.
            Modifications to this config left are left for experienced OSM users.
            Defaults to predefined "osm_way_polygon_features.json".
        parquet_compression (str, optional): Compression of intermediate parquet files.
            Check https://duckdb.org/docs/sql/statements/copy#parquet-options for more info.
            Defaults to "snappy".
        osm_extract_source (Union[OsmExtractSource, str], optional): A source for automatic
            downloading of OSM extracts. Can be Geofabrik, BBBike, OSMfr or any.
            Defaults to `any`.
        verbosity_mode (Literal["silent", "transient", "verbose"], optional): Set progress
            verbosity mode. Can be one of: silent, transient and verbose. Silent disables
            output completely. Transient tracks progress, but removes output after finished.
            Verbose leaves all progress outputs in the stdout. Defaults to "transient".
        geometry_coverage_iou_threshold (float): Minimal value of the Intersection over Union
            metric for selecting the matching OSM extracts. Is best matching extract has value
            lower than the threshold, it is discarded (except the first one). Has to be in range
            between 0 and 1. Value of 0 will allow every intersected extract, value of 1 will
            only allow extracts that match the geometry exactly. Defaults to 0.01.
        allow_uncovered_geometry (bool, optional): Suppress an error if some geometry parts
            aren't covered by any OSM extract. Defaults to `False`.
        debug_memory (bool, optional): If turned on, will keep all temporary files after
            operation for debugging. Defaults to `False`.
        debug_times (bool, optional): If turned on, will report timestamps at which second each
            step has been executed. Defaults to `False`.

    Raises:
        InvalidGeometryFilter: When provided geometry filter has parts without area.
    """
    self.geometry_filter = geometry_filter
    self._check_if_valid_geometry_filter()

    self.tags_filter = tags_filter
    self.is_tags_filter_positive = (
        check_if_any_osm_tags_filter_value_is_positive(self.tags_filter)
        if self.tags_filter is not None
        else False
    )
    self.expanded_tags_filter: Optional[Union[GroupedOsmTagsFilter, OsmTagsFilter]] = None
    self.merged_tags_filter: Optional[Union[GroupedOsmTagsFilter, OsmTagsFilter]] = None

    self.custom_sql_filter = custom_sql_filter

    self.geometry_coverage_iou_threshold = geometry_coverage_iou_threshold
    self.allow_uncovered_geometry = allow_uncovered_geometry
    self.osm_extract_source = osm_extract_source
    self.working_directory = Path(working_directory)
    self.working_directory.mkdir(parents=True, exist_ok=True)
    self.connection: duckdb.DuckDBPyConnection = None
    self.encountered_query_exception = False
    self.verbosity_mode = verbosity_mode
    self.debug_memory = debug_memory
    self.debug_times = debug_times
    self.task_progress_tracker: TaskProgressTracker = None
    self.rows_per_group: int = 0

    self.parquet_compression = parquet_compression

    if osm_way_polygon_features_config is None:
        # Config based on two sources + manual OSM wiki check
        # 1. https://github.com/tyrasd/osm-polygon-features/blob/v0.9.2/polygon-features.json
        # 2. https://github.com/ideditor/id-area-keys/blob/v5.0.1/areaKeys.json
        osm_way_polygon_features_config = json.loads(
            (Path(__file__).parent / "osm_way_polygon_features.json").read_text()
        )

    self.osm_way_polygon_features_config: OsmWayPolygonConfig = (
        osm_way_polygon_features_config
        if isinstance(osm_way_polygon_features_config, OsmWayPolygonConfig)
        else parse_dict_to_config_object(osm_way_polygon_features_config)
    )

    self.convert_pbf_to_gpq = deprecate(
        "convert_pbf_to_gpq",
        self.convert_pbf_to_parquet,
        "0.8.1",
        msg="Use `convert_pbf_to_parquet` instead. Deprecated since 0.8.1 version.",
    )

    self.convert_geometry_filter_to_gpq = deprecate(
        "convert_geometry_filter_to_gpq",
        self.convert_geometry_to_parquet,
        "0.8.1",
        msg="Use `convert_geometry_to_parquet` instead. Deprecated since 0.8.1 version.",
    )

    self.get_features_gdf = deprecate(
        "get_features_gdf",
        self.convert_pbf_to_geodataframe,
        "0.8.1",
        msg="Use `convert_pbf_to_geodataframe` instead. Deprecated since 0.8.1 version.",
    )

    self.get_features_gdf_from_geometry = deprecate(
        "get_features_gdf_from_geometry",
        self.convert_geometry_to_geodataframe,
        "0.8.1",
        msg="Use `convert_geometry_to_geodataframe` instead. Deprecated since 0.8.1 version.",
    )

ConvertedOSMParquetFiles

Bases: NamedTuple

List of parquet files read from the *.osm.pbf file.

convert_pbf_to_parquet(
    pbf_path,
    result_file_path=None,
    keep_all_tags=False,
    explode_tags=None,
    ignore_cache=False,
    filter_osm_ids=None,
    save_as_wkt=False,
    pbf_extract_geometry=None,
)

Convert PBF file to GeoParquet file.

PARAMETER DESCRIPTION
pbf_path

Path or list of paths of *.osm.pbf files to be parsed. Can be an URL.

TYPE: Union[str, Path, Iterable[Union[str, Path]]]

result_file_path

Where to save the geoparquet file. If not provided, will be generated based on hashes from provided tags filter and geometry filter. Defaults to None.

TYPE: Union[str, Path] DEFAULT: None

keep_all_tags

Works only with the tags_filter parameter. Whether to keep all tags related to the element, or return only those defined in the tags_filter. When True, will override the optional grouping defined in the tags_filter. Defaults to False.

TYPE: bool DEFAULT: False

explode_tags

Whether to split tags into columns based on OSM tag keys. If None, will be set based on tags_filter and keep_all_tags parameters. If there is tags filter defined and keep_all_tags is set to False, then it will be set to True. Otherwise it will be set to False. Defaults to None.

TYPE: bool DEFAULT: None

ignore_cache

Whether to ignore precalculated geoparquet files or not. Defaults to False.

TYPE: bool DEFAULT: False

filter_osm_ids

(list[str], optional): List of OSM features ids to read from the file. Have to be in the form of 'node/', 'way/' or 'relation/'. Defaults to an empty list.

TYPE: Optional[list[str]] DEFAULT: None

save_as_wkt

Whether to save the file with geometry in the WKT form instead of WKB. If True, it will be saved as a .parquet file, because it won't be in the GeoParquet standard. Defaults to False.

TYPE: bool DEFAULT: False

pbf_extract_geometry

List of geometries defining PBF extract. Used internally to speed up intersections for complex filters. Defaults to None.

TYPE: Optional[Union[BaseGeometry, Iterable[BaseGeometry]]] DEFAULT: None

RETURNS DESCRIPTION
Path

Path to the generated GeoParquet file.

TYPE: Path

Source code in quackosm/pbf_file_reader.py
def convert_pbf_to_parquet(
    self,
    pbf_path: Union[str, Path, Iterable[Union[str, Path]]],
    result_file_path: Optional[Union[str, Path]] = None,
    keep_all_tags: bool = False,
    explode_tags: Optional[bool] = None,
    ignore_cache: bool = False,
    filter_osm_ids: Optional[list[str]] = None,
    save_as_wkt: bool = False,
    pbf_extract_geometry: Optional[Union[BaseGeometry, Iterable[BaseGeometry]]] = None,
) -> Path:
    """
    Convert PBF file to GeoParquet file.

    Args:
        pbf_path (Union[str, Path, Iterable[Union[str, Path]]]):
            Path or list of paths of `*.osm.pbf` files to be parsed. Can be an URL.
        result_file_path (Union[str, Path], optional): Where to save
            the geoparquet file. If not provided, will be generated based on hashes
            from provided tags filter and geometry filter. Defaults to `None`.
        keep_all_tags (bool, optional): Works only with the `tags_filter` parameter.
            Whether to keep all tags related to the element, or return only those defined
            in the `tags_filter`. When `True`, will override the optional grouping defined
            in the `tags_filter`. Defaults to `False`.
        explode_tags (bool, optional): Whether to split tags into columns based on OSM tag keys.
            If `None`, will be set based on `tags_filter` and `keep_all_tags` parameters.
            If there is tags filter defined and `keep_all_tags` is set to `False`, then it will
            be set to `True`. Otherwise it will be set to `False`. Defaults to `None`.
        ignore_cache (bool, optional): Whether to ignore precalculated geoparquet files or not.
            Defaults to False.
        filter_osm_ids: (list[str], optional): List of OSM features ids to read from the file.
            Have to be in the form of 'node/<id>', 'way/<id>' or 'relation/<id>'.
            Defaults to an empty list.
        save_as_wkt (bool): Whether to save the file with geometry in the WKT form instead
            of WKB. If `True`, it will be saved as a `.parquet` file, because it won't be
            in the GeoParquet standard. Defaults to `False`.
        pbf_extract_geometry (Optional[Union[BaseGeometry, Iterable[BaseGeometry]]], optional):
            List of geometries defining PBF extract. Used internally to speed up intersections
            for complex filters. Defaults to `None`.

    Returns:
        Path: Path to the generated GeoParquet file.
    """
    if isinstance(pbf_path, (str, Path)):
        pbf_path = [pbf_path]
    else:
        pbf_path = list(pbf_path)

    if pbf_extract_geometry is not None:
        if isinstance(pbf_extract_geometry, BaseGeometry):
            pbf_extract_geometry = [pbf_extract_geometry]
        else:
            pbf_extract_geometry = list(pbf_extract_geometry)
            if len(pbf_extract_geometry) != len(pbf_path):
                raise AttributeError(
                    "Provided pbf_extract_geometry has a different length "
                    "than the list of pbf paths."
                )

    if filter_osm_ids is None:
        filter_osm_ids = []

    if explode_tags is None:
        explode_tags = (
            self.tags_filter is not None and self.is_tags_filter_positive and not keep_all_tags
        )

    parsed_geoparquet_files = []
    total_files = len(pbf_path)
    self.task_progress_tracker = TaskProgressTracker(
        verbosity_mode=self.verbosity_mode,
        total_major_steps=total_files,
        debug=self.debug_times,
    )
    if total_files == 1:
        single_pbf_extract_geometry = None
        if pbf_extract_geometry is not None:
            single_pbf_extract_geometry = pbf_extract_geometry[0]
        parsed_geoparquet_file = self._convert_single_pbf_to_parquet(
            pbf_path[0],
            result_file_path=result_file_path,
            keep_all_tags=keep_all_tags,
            explode_tags=explode_tags,
            ignore_cache=ignore_cache,
            filter_osm_ids=filter_osm_ids,
            save_as_wkt=save_as_wkt,
            pbf_extract_geometry=single_pbf_extract_geometry,
        )
        self.task_progress_tracker.stop()
        return parsed_geoparquet_file
    else:
        result_file_path = Path(
            result_file_path
            or self._generate_result_file_path(
                pbf_path,
                filter_osm_ids=filter_osm_ids,
                keep_all_tags=keep_all_tags,
                explode_tags=explode_tags,
                save_as_wkt=save_as_wkt,
            )
        )

        if result_file_path.exists() and not ignore_cache:
            return result_file_path
        elif result_file_path.with_suffix(".geoparquet").exists() and not ignore_cache:
            warnings.warn(
                (
                    "Found existing result file with `.geoparquet` extension."
                    " Users are enouraged to change the extension manually"
                    " to `.parquet` for old files. Files with `.geoparquet`"
                    " extension will be backwards supported, but reusing them"
                    " will result in this warning."
                ),
                DeprecationWarning,
                stacklevel=0,
            )
            return result_file_path.with_suffix(".geoparquet")

        for file_idx, single_pbf_path in enumerate(pbf_path):
            self.task_progress_tracker.reset_steps(file_idx + 1)

            single_pbf_extract_geometry = None
            if pbf_extract_geometry is not None:
                single_pbf_extract_geometry = pbf_extract_geometry[file_idx]

            parsed_geoparquet_file = self._convert_single_pbf_to_parquet(
                single_pbf_path,
                keep_all_tags=keep_all_tags,
                explode_tags=explode_tags,
                ignore_cache=ignore_cache,
                filter_osm_ids=filter_osm_ids,
                save_as_wkt=save_as_wkt,
                pbf_extract_geometry=single_pbf_extract_geometry,
            )
            parsed_geoparquet_files.append(parsed_geoparquet_file)

        if parsed_geoparquet_files:
            with tempfile.TemporaryDirectory(
                dir=self.working_directory.resolve()
            ) as tmp_dir_name:
                if self.debug_memory:
                    tmp_dir_name = self._prepare_debug_directory()  # type: ignore[assignment] # noqa: PLW2901
                tmp_dir_path = Path(tmp_dir_name)

                try:
                    parquet_files_without_duplicates = (
                        self._drop_duplicated_features_in_pyarrow_table(
                            parsed_geoparquet_files=parsed_geoparquet_files,
                            tmp_dir_path=tmp_dir_path,
                        )
                    )
                except (pa.ArrowInvalid, MemoryError, MultiprocessingRuntimeError):
                    try:
                        parquet_files_without_duplicates = (
                            self._drop_duplicated_features_in_joined_table(
                                parsed_geoparquet_files=parsed_geoparquet_files,
                                tmp_dir_path=tmp_dir_path,
                            )
                        )
                    except MemoryError:
                        parquet_files_without_duplicates = (
                            self._drop_duplicated_features_in_joined_table_one_by_one(
                                parsed_geoparquet_files=parsed_geoparquet_files,
                                tmp_dir_path=tmp_dir_path,
                            )
                        )

                self._combine_parquet_files(
                    parquet_files_without_duplicates,
                    result_file_path=result_file_path,
                    save_as_wkt=save_as_wkt,
                )
        else:
            warnings.warn(
                "Found 0 extracts covering the geometry. Returning empty result.",
                EmptyResultWarning,
                stacklevel=0,
            )
            if save_as_wkt:
                geometry_column = ga.as_wkt(gpd.GeoSeries([], crs=WGS84_CRS))
            else:
                geometry_column = ga.as_wkb(gpd.GeoSeries([], crs=WGS84_CRS))
            joined_parquet_table = pa.table(
                [pa.array([], type=pa.string()), geometry_column],
                names=[FEATURES_INDEX, GEOMETRY_COLUMN],
            )
            if save_as_wkt:
                pq.write_table(joined_parquet_table, result_file_path)
            else:
                io.write_geoparquet_table(
                    joined_parquet_table,
                    result_file_path,
                    primary_geometry_column=GEOMETRY_COLUMN,
                )

        self.task_progress_tracker.stop()

    return Path(result_file_path)

convert_geometry_to_parquet(
    result_file_path=None,
    keep_all_tags=False,
    explode_tags=None,
    ignore_cache=False,
    filter_osm_ids=None,
    save_as_wkt=False,
)

Convert geometry to GeoParquet file.

Will automatically find and download OSM extracts covering a given geometry and convert them to a single GeoParquet file.

PARAMETER DESCRIPTION
result_file_path

Where to save the geoparquet file. If not provided, will be generated based on hashes from provided tags filter and geometry filter. Defaults to None.

TYPE: Union[str, Path] DEFAULT: None

keep_all_tags

Works only with the tags_filter parameter. Whether to keep all tags related to the element, or return only those defined in the tags_filter. When True, will override the optional grouping defined in the tags_filter. Defaults to False.

TYPE: bool DEFAULT: False

explode_tags

Whether to split tags into columns based on OSM tag keys. If None, will be set based on tags_filter and keep_all_tags parameters. If there is tags filter defined and keep_all_tags is set to False, then it will be set to True. Otherwise it will be set to False. Defaults to None.

TYPE: bool DEFAULT: None

ignore_cache

Whether to ignore precalculated geoparquet files or not. Defaults to False.

TYPE: bool DEFAULT: False

filter_osm_ids

(list[str], optional): List of OSM features ids to read from the file. Have to be in the form of 'node/', 'way/' or 'relation/'. Defaults to an empty list.

TYPE: Optional[list[str]] DEFAULT: None

save_as_wkt

Whether to save the file with geometry in the WKT form instead of WKB. If True, it will be saved as a .parquet file, because it won't be in the GeoParquet standard. Defaults to False.

TYPE: bool DEFAULT: False

RETURNS DESCRIPTION
Path

Path to the generated GeoParquet file.

TYPE: Path

Source code in quackosm/pbf_file_reader.py
def convert_geometry_to_parquet(
    self,
    result_file_path: Optional[Union[str, Path]] = None,
    keep_all_tags: bool = False,
    explode_tags: Optional[bool] = None,
    ignore_cache: bool = False,
    filter_osm_ids: Optional[list[str]] = None,
    save_as_wkt: bool = False,
) -> Path:
    """
    Convert geometry to GeoParquet file.

    Will automatically find and download OSM extracts covering a given geometry
    and convert them to a single GeoParquet file.

    Args:
        result_file_path (Union[str, Path], optional): Where to save
            the geoparquet file. If not provided, will be generated based on hashes
            from provided tags filter and geometry filter. Defaults to `None`.
        keep_all_tags (bool, optional): Works only with the `tags_filter` parameter.
            Whether to keep all tags related to the element, or return only those defined
            in the `tags_filter`. When `True`, will override the optional grouping defined
            in the `tags_filter`. Defaults to `False`.
        explode_tags (bool, optional): Whether to split tags into columns based on OSM tag keys.
            If `None`, will be set based on `tags_filter` and `keep_all_tags` parameters.
            If there is tags filter defined and `keep_all_tags` is set to `False`, then it will
            be set to `True`. Otherwise it will be set to `False`. Defaults to `None`.
        ignore_cache (bool, optional): Whether to ignore precalculated geoparquet files or not.
            Defaults to False.
        filter_osm_ids: (list[str], optional): List of OSM features ids to read from the file.
            Have to be in the form of 'node/<id>', 'way/<id>' or 'relation/<id>'.
            Defaults to an empty list.
        save_as_wkt (bool): Whether to save the file with geometry in the WKT form instead
            of WKB. If `True`, it will be saved as a `.parquet` file, because it won't be
            in the GeoParquet standard. Defaults to `False`.

    Returns:
        Path: Path to the generated GeoParquet file.
    """
    if self.geometry_filter is None:
        raise AttributeError(
            "Cannot find matching OSM extracts without geometry filter. Please configure"
            " geometry_filter first: PbfFileReader(geometry_filter=..., **kwargs)."
        )

    if filter_osm_ids is None:
        filter_osm_ids = []

    if explode_tags is None:
        explode_tags = (
            self.tags_filter is not None and self.is_tags_filter_positive and not keep_all_tags
        )

    result_file_path = Path(
        result_file_path
        or self._generate_result_file_path_from_geometry(
            filter_osm_ids=filter_osm_ids,
            keep_all_tags=keep_all_tags,
            explode_tags=explode_tags,
            save_as_wkt=save_as_wkt,
        )
    )

    if result_file_path.exists() and not ignore_cache:
        return result_file_path
    elif result_file_path.with_suffix(".geoparquet").exists() and not ignore_cache:
        warnings.warn(
            (
                "Found existing result file with `.geoparquet` extension."
                " Users are enouraged to change the extension manually"
                " to `.parquet` for old files. Files with `.geoparquet`"
                " extension will be backwards supported, but reusing them"
                " will result in this warning."
            ),
            DeprecationWarning,
            stacklevel=0,
        )
        return result_file_path.with_suffix(".geoparquet")

    matching_extracts = find_smallest_containing_extracts(
        self.geometry_filter,
        self.osm_extract_source,
        geometry_coverage_iou_threshold=self.geometry_coverage_iou_threshold,
        allow_uncovered_geometry=self.allow_uncovered_geometry,
    )
    pbf_files = download_extracts_pbf_files(
        matching_extracts, self.working_directory, progressbar=self.verbosity_mode != "silent"
    )
    return self.convert_pbf_to_parquet(
        pbf_files,
        result_file_path=result_file_path,
        keep_all_tags=keep_all_tags,
        explode_tags=explode_tags,
        ignore_cache=ignore_cache,
        filter_osm_ids=filter_osm_ids,
        save_as_wkt=save_as_wkt,
        pbf_extract_geometry=[
            matching_extract.geometry for matching_extract in matching_extracts
        ],
    )

convert_pbf_to_geodataframe(
    pbf_path,
    keep_all_tags=False,
    explode_tags=None,
    ignore_cache=False,
    filter_osm_ids=None,
)

Get features GeoDataFrame from a list of PBF files.

Function parses multiple PBF files and returns a single GeoDataFrame with parsed OSM objects.

PARAMETER DESCRIPTION
pbf_path

Path or list of paths of *.osm.pbf files to be parsed. Can be an URL.

TYPE: Union[str, Path, Iterable[Union[str, Path]]]

keep_all_tags

Works only with the tags_filter parameter. Whether to keep all tags related to the element, or return only those defined in the tags_filter. When True, will override the optional grouping defined in the tags_filter. Defaults to False.

TYPE: bool DEFAULT: False

explode_tags

Whether to split tags into columns based on OSM tag keys. If None, will be set based on tags_filter and keep_all_tags parameters. If there is tags filter defined and keep_all_tags is set to False, then it will be set to True. Otherwise it will be set to False. Defaults to None.

TYPE: bool DEFAULT: None

ignore_cache

(bool, optional): Whether to ignore precalculated geoparquet files or not. Defaults to False.

TYPE: bool DEFAULT: False

filter_osm_ids

(list[str], optional): List of OSM features ids to read from the file. Have to be in the form of 'node/', 'way/' or 'relation/'. Defaults to an empty list.

TYPE: Optional[list[str]] DEFAULT: None

RETURNS DESCRIPTION
GeoDataFrame

gpd.GeoDataFrame: GeoDataFrame with OSM features.

Source code in quackosm/pbf_file_reader.py
@deprecate_kwarg(old_arg_name="file_paths", new_arg_name="pbf_path")  # type: ignore
def convert_pbf_to_geodataframe(
    self,
    pbf_path: Union[str, Path, Iterable[Union[str, Path]]],
    keep_all_tags: bool = False,
    explode_tags: Optional[bool] = None,
    ignore_cache: bool = False,
    filter_osm_ids: Optional[list[str]] = None,
) -> gpd.GeoDataFrame:
    """
    Get features GeoDataFrame from a list of PBF files.

    Function parses multiple PBF files and returns a single GeoDataFrame with parsed
    OSM objects.

    Args:
        pbf_path (Union[str, Path, Iterable[Union[str, Path]]]):
            Path or list of paths of `*.osm.pbf` files to be parsed. Can be an URL.
        keep_all_tags (bool, optional): Works only with the `tags_filter` parameter.
            Whether to keep all tags related to the element, or return only those defined
            in the `tags_filter`. When `True`, will override the optional grouping defined
            in the `tags_filter`. Defaults to `False`.
        explode_tags (bool, optional): Whether to split tags into columns based on OSM tag keys.
            If `None`, will be set based on `tags_filter` and `keep_all_tags` parameters.
            If there is tags filter defined and `keep_all_tags` is set to `False`, then it will
            be set to `True`. Otherwise it will be set to `False`. Defaults to `None`.
        ignore_cache: (bool, optional): Whether to ignore precalculated geoparquet files or not.
            Defaults to False.
        filter_osm_ids: (list[str], optional): List of OSM features ids to read from the file.
            Have to be in the form of 'node/<id>', 'way/<id>' or 'relation/<id>'.
            Defaults to an empty list.

    Returns:
        gpd.GeoDataFrame: GeoDataFrame with OSM features.
    """
    if isinstance(pbf_path, (str, Path)):
        pbf_path = [pbf_path]

    parsed_geoparquet_file = self.convert_pbf_to_parquet(
        pbf_path=pbf_path,
        keep_all_tags=keep_all_tags,
        explode_tags=explode_tags,
        ignore_cache=ignore_cache,
        filter_osm_ids=filter_osm_ids,
    )
    joined_parquet_table = io.read_geoparquet_table(parsed_geoparquet_file)
    gdf_parquet = gpd.GeoDataFrame(
        data=joined_parquet_table.drop(GEOMETRY_COLUMN).to_pandas(maps_as_pydicts="strict"),
        geometry=ga.to_geopandas(joined_parquet_table.column(GEOMETRY_COLUMN)),
    ).set_index(FEATURES_INDEX)

    return gdf_parquet

convert_geometry_to_geodataframe(
    keep_all_tags=False,
    explode_tags=None,
    ignore_cache=False,
    filter_osm_ids=None,
)

Get features GeoDataFrame from a provided geometry filter.

Will automatically find and download OSM extracts covering a given geometry and return a single GeoDataFrame with parsed OSM objects.

PARAMETER DESCRIPTION
keep_all_tags

Works only with the tags_filter parameter. Whether to keep all tags related to the element, or return only those defined in the tags_filter. When True, will override the optional grouping defined in the tags_filter. Defaults to False.

TYPE: bool DEFAULT: False

explode_tags

Whether to split tags into columns based on OSM tag keys. If None, will be set based on tags_filter and keep_all_tags parameters. If there is tags filter defined and keep_all_tags is set to False, then it will be set to True. Otherwise it will be set to False. Defaults to None.

TYPE: bool DEFAULT: None

ignore_cache

(bool, optional): Whether to ignore precalculated geoparquet files or not. Defaults to False.

TYPE: bool DEFAULT: False

filter_osm_ids

(list[str], optional): List of OSM features ids to read from the file. Have to be in the form of 'node/', 'way/' or 'relation/'. Defaults to an empty list.

TYPE: Optional[list[str]] DEFAULT: None

RETURNS DESCRIPTION
GeoDataFrame

gpd.GeoDataFrame: GeoDataFrame with OSM features.

Source code in quackosm/pbf_file_reader.py
def convert_geometry_to_geodataframe(
    self,
    keep_all_tags: bool = False,
    explode_tags: Optional[bool] = None,
    ignore_cache: bool = False,
    filter_osm_ids: Optional[list[str]] = None,
) -> gpd.GeoDataFrame:
    """
    Get features GeoDataFrame from a provided geometry filter.

    Will automatically find and download OSM extracts covering a given geometry
    and return a single GeoDataFrame with parsed OSM objects.

    Args:
        keep_all_tags (bool, optional): Works only with the `tags_filter` parameter.
            Whether to keep all tags related to the element, or return only those defined
            in the `tags_filter`. When `True`, will override the optional grouping defined
            in the `tags_filter`. Defaults to `False`.
        explode_tags (bool, optional): Whether to split tags into columns based on OSM tag keys.
            If `None`, will be set based on `tags_filter` and `keep_all_tags` parameters.
            If there is tags filter defined and `keep_all_tags` is set to `False`, then it will
            be set to `True`. Otherwise it will be set to `False`. Defaults to `None`.
        ignore_cache: (bool, optional): Whether to ignore precalculated geoparquet files or not.
            Defaults to False.
        filter_osm_ids: (list[str], optional): List of OSM features ids to read from the file.
            Have to be in the form of 'node/<id>', 'way/<id>' or 'relation/<id>'.
            Defaults to an empty list.

    Returns:
        gpd.GeoDataFrame: GeoDataFrame with OSM features.
    """
    parsed_geoparquet_file = self.convert_geometry_to_parquet(
        keep_all_tags=keep_all_tags,
        explode_tags=explode_tags,
        ignore_cache=ignore_cache,
        filter_osm_ids=filter_osm_ids,
    )
    joined_parquet_table = io.read_geoparquet_table(parsed_geoparquet_file)
    gdf_parquet = gpd.GeoDataFrame(
        data=joined_parquet_table.drop(GEOMETRY_COLUMN).to_pandas(maps_as_pydicts="strict"),
        geometry=ga.to_geopandas(joined_parquet_table.column(GEOMETRY_COLUMN)),
    ).set_index(FEATURES_INDEX)

    return gdf_parquet

convert_pbf_to_duckdb(
    pbf_path,
    result_file_path=None,
    keep_all_tags=False,
    explode_tags=None,
    ignore_cache=False,
    filter_osm_ids=None,
    duckdb_table_name="quackosm",
)

Convert PBF file to DuckDB Database.

Function parses multiple PBF files and returns a single GeoDataFrame with parsed OSM objects.

PARAMETER DESCRIPTION
pbf_path

Path or list of paths of *.osm.pbf files to be parsed. Can be an URL.

TYPE: Union[str, Path, Iterable[Union[str, Path]]]

result_file_path

Where to save the duckdb file. If not provided, will be generated based on hashes from provided tags filter and geometry filter. Defaults to None.

TYPE: Union[str, Path] DEFAULT: None

keep_all_tags

Works only with the tags_filter parameter. Whether to keep all tags related to the element, or return only those defined in the tags_filter. When True, will override the optional grouping defined in the tags_filter. Defaults to False.

TYPE: bool DEFAULT: False

explode_tags

Whether to split tags into columns based on OSM tag keys. If None, will be set based on tags_filter and keep_all_tags parameters. If there is tags filter defined and keep_all_tags is set to False, then it will be set to True. Otherwise it will be set to False. Defaults to None.

TYPE: bool DEFAULT: None

ignore_cache

(bool, optional): Whether to ignore precalculated geoparquet files or not. Defaults to False.

TYPE: bool DEFAULT: False

filter_osm_ids

(list[str], optional): List of OSM features ids to read from the file. Have to be in the form of 'node/', 'way/' or 'relation/'. Defaults to an empty list.

TYPE: Optional[list[str]] DEFAULT: None

duckdb_table_name

Table name in which data will be stored inside the DuckDB database (default: "quackosm")

TYPE: str DEFAULT: 'quackosm'

RETURNS DESCRIPTION
Path

gpd.GeoDataFrame: GeoDataFrame with OSM features.

Source code in quackosm/pbf_file_reader.py
def convert_pbf_to_duckdb(
    self,
    pbf_path: Union[str, Path, Iterable[Union[str, Path]]],
    result_file_path: Optional[Union[str, Path]] = None,
    keep_all_tags: bool = False,
    explode_tags: Optional[bool] = None,
    ignore_cache: bool = False,
    filter_osm_ids: Optional[list[str]] = None,
    duckdb_table_name: Optional[str] = "quackosm",
) -> Path:
    """
    Convert PBF file to DuckDB Database.

    Function parses multiple PBF files and returns a single GeoDataFrame with parsed
    OSM objects.

    Args:
        pbf_path (Union[str, Path, Iterable[Union[str, Path]]]):
            Path or list of paths of `*.osm.pbf` files to be parsed. Can be an URL.
        result_file_path (Union[str, Path], optional): Where to save
            the duckdb file. If not provided, will be generated based on hashes
            from provided tags filter and geometry filter. Defaults to `None`.
        keep_all_tags (bool, optional): Works only with the `tags_filter` parameter.
            Whether to keep all tags related to the element, or return only those defined
            in the `tags_filter`. When `True`, will override the optional grouping defined
            in the `tags_filter`. Defaults to `False`.
        explode_tags (bool, optional): Whether to split tags into columns based on OSM tag keys.
            If `None`, will be set based on `tags_filter` and `keep_all_tags` parameters.
            If there is tags filter defined and `keep_all_tags` is set to `False`, then it will
            be set to `True`. Otherwise it will be set to `False`. Defaults to `None`.
        ignore_cache: (bool, optional): Whether to ignore precalculated geoparquet files or not.
            Defaults to False.
        filter_osm_ids: (list[str], optional): List of OSM features ids to read from the file.
            Have to be in the form of 'node/<id>', 'way/<id>' or 'relation/<id>'.
            Defaults to an empty list.
        duckdb_table_name (str): Table name in which data will be stored inside the DuckDB
            database (default: "quackosm")

    Returns:
        gpd.GeoDataFrame: GeoDataFrame with OSM features.
    """
    if isinstance(pbf_path, (str, Path)):
        pbf_path = [pbf_path]

    parsed_geoparquet_file = self.convert_pbf_to_parquet(
        pbf_path=pbf_path,
        keep_all_tags=keep_all_tags,
        explode_tags=explode_tags,
        ignore_cache=ignore_cache,
        filter_osm_ids=filter_osm_ids,
    )

    if filter_osm_ids is None:
        filter_osm_ids = []

    # generate result_file_path if missing
    result_file_path = Path(
        result_file_path
        or self._generate_result_file_path(
            pbf_path=pbf_path,
            filter_osm_ids=filter_osm_ids,
            keep_all_tags=keep_all_tags,
            explode_tags=explode_tags or False,
            save_as_wkt=False,
        ).with_suffix(".duckdb")
    )

    result_file_path.parent.mkdir(exist_ok=True, parents=True)

    duckdb_table_name = duckdb_table_name or "quackosm"

    with duckdb.connect(str(result_file_path)) as con:
        con.load_extension("spatial")
        con.sql(
            f"""
            CREATE OR REPLACE TABLE {duckdb_table_name} AS
            SELECT * FROM read_parquet('{parsed_geoparquet_file}');
        """
        )

    # clean up intermediary parquet
    parsed_geoparquet_file.unlink()

    return result_file_path

convert_geometry_to_duckdb(
    result_file_path=None,
    keep_all_tags=False,
    explode_tags=None,
    ignore_cache=False,
    filter_osm_ids=None,
    duckdb_table_name="quackosm",
)

Get features GeoDataFrame from a provided geometry filter.

Will automatically find and download OSM extracts covering a given geometry and return a single GeoDataFrame with parsed OSM objects.

PARAMETER DESCRIPTION
result_file_path

Where to save the duckdb file. If not provided, will be generated based on hashes from provided tags filter and geometry filter. Defaults to None.

TYPE: Union[str, Path] DEFAULT: None

keep_all_tags

Works only with the tags_filter parameter. Whether to keep all tags related to the element, or return only those defined in the tags_filter. When True, will override the optional grouping defined in the tags_filter. Defaults to False.

TYPE: bool DEFAULT: False

explode_tags

Whether to split tags into columns based on OSM tag keys. If None, will be set based on tags_filter and keep_all_tags parameters. If there is tags filter defined and keep_all_tags is set to False, then it will be set to True. Otherwise it will be set to False. Defaults to None.

TYPE: bool DEFAULT: None

ignore_cache

(bool, optional): Whether to ignore precalculated geoparquet files or not. Defaults to False.

TYPE: bool DEFAULT: False

filter_osm_ids

(list[str], optional): List of OSM features ids to read from the file. Have to be in the form of 'node/', 'way/' or 'relation/'. Defaults to an empty list.

TYPE: Optional[list[str]] DEFAULT: None

duckdb_table_name

Table name in which data will be stored inside the DuckDB database (default: "quackosm")

TYPE: str DEFAULT: 'quackosm'

RETURNS DESCRIPTION
Path

gpd.GeoDataFrame: GeoDataFrame with OSM features.

Source code in quackosm/pbf_file_reader.py
def convert_geometry_to_duckdb(
    self,
    result_file_path: Optional[Union[str, Path]] = None,
    keep_all_tags: bool = False,
    explode_tags: Optional[bool] = None,
    ignore_cache: bool = False,
    filter_osm_ids: Optional[list[str]] = None,
    duckdb_table_name: str = "quackosm",
) -> Path:
    """
    Get features GeoDataFrame from a provided geometry filter.

    Will automatically find and download OSM extracts covering a given geometry
    and return a single GeoDataFrame with parsed OSM objects.

    Args:
        result_file_path (Union[str, Path], optional): Where to save
            the duckdb file. If not provided, will be generated based on hashes
            from provided tags filter and geometry filter. Defaults to `None`.
        keep_all_tags (bool, optional): Works only with the `tags_filter` parameter.
            Whether to keep all tags related to the element, or return only those defined
            in the `tags_filter`. When `True`, will override the optional grouping defined
            in the `tags_filter`. Defaults to `False`.
        explode_tags (bool, optional): Whether to split tags into columns based on OSM tag keys.
            If `None`, will be set based on `tags_filter` and `keep_all_tags` parameters.
            If there is tags filter defined and `keep_all_tags` is set to `False`, then it will
            be set to `True`. Otherwise it will be set to `False`. Defaults to `None`.
        ignore_cache: (bool, optional): Whether to ignore precalculated geoparquet files or not.
            Defaults to False.
        filter_osm_ids: (list[str], optional): List of OSM features ids to read from the file.
            Have to be in the form of 'node/<id>', 'way/<id>' or 'relation/<id>'.
            Defaults to an empty list.
        duckdb_table_name (str): Table name in which data will be stored inside the DuckDB
            database (default: "quackosm")

    Returns:
        gpd.GeoDataFrame: GeoDataFrame with OSM features.
    """
    parsed_geoparquet_file = self.convert_geometry_to_parquet(
        keep_all_tags=keep_all_tags,
        explode_tags=explode_tags,
        ignore_cache=ignore_cache,
        filter_osm_ids=filter_osm_ids,
    )

    if filter_osm_ids is None:
        filter_osm_ids = []

    # generate result_file_path if missing
    result_file_path = Path(
        result_file_path
        or self._generate_result_file_path_from_geometry(
            filter_osm_ids=filter_osm_ids,
            keep_all_tags=keep_all_tags,
            explode_tags=explode_tags or False,
            save_as_wkt=False,
        ).with_suffix(".duckdb")
    )

    with duckdb.connect(str(result_file_path)) as con:
        con.load_extension("spatial")

        con.sql(
            f"""
            CREATE OR REPLACE TABLE {duckdb_table_name} AS
            SELECT * FROM read_parquet('{parsed_geoparquet_file}');
        """
        )

    # clean up intermediary parquet
    parsed_geoparquet_file.unlink()

    return result_file_path