PbfFileReader¶

quackosm.pbf_file_reader.PbfFileReader(
    tags_filter=None,
    geometry_filter=None,
    working_directory="files",
    osm_way_polygon_features_config=None,
)

¶

PbfFileReader.

PBF(Protocolbuffer Binary Format)[1] file reader is a dedicated *.osm.pbf files reader class based on DuckDB[2] and its spatial extension[3].

Handler can filter out OSM features based on tags filter and geometry filter to limit the result.

References

PARAMETER	DESCRIPTION
`tags_filter`	A dictionary specifying which tags to download. The keys should be OSM tags (e.g. `building`, `amenity`). The values should either be `True` for retrieving all objects with the tag, string for retrieving a single tag-value pair or list of strings for retrieving all values specified in the list. `tags={'leisure': 'park}` would return parks from the area. `tags={'leisure': 'park, 'amenity': True, 'shop': ['bakery', 'bicycle']}` would return parks, all amenity types, bakeries and bicycle shops. If `None`, handler will allow all of the tags to be parsed. Defaults to `None`. TYPE: `Union[OsmTagsFilter, GroupedOsmTagsFilter]` DEFAULT: `None`
`geometry_filter`	Region which can be used to filter only intersecting OSM objects. Defaults to `None`. TYPE: `BaseGeometry` DEFAULT: `None`
`working_directory`	Directory where to save the parsed `.parquet` files. Defaults to "files". TYPE:* `Union[str, Path]` DEFAULT: `'files'`
`osm_way_polygon_features_config`	Config used to determine which closed way features are polygons. Modifications to this config left are left for experienced OSM users. Defaults to predefined "osm_way_polygon_features.json". TYPE: `Union[OsmWayPolygonConfig, dict[str, Any]]` DEFAULT: `None`

Source code in quackosm/pbf_file_reader.py

def __init__(
    self,
    tags_filter: Optional[Union[OsmTagsFilter, GroupedOsmTagsFilter]] = None,
    geometry_filter: Optional[BaseGeometry] = None,
    working_directory: Union[str, Path] = "files",
    osm_way_polygon_features_config: Optional[
        Union[OsmWayPolygonConfig, dict[str, Any]]
    ] = None,
) -> None:
    """
    Initialize PbfFileReader.

    Args:
        tags_filter (Union[OsmTagsFilter, GroupedOsmTagsFilter], optional): A dictionary
            specifying which tags to download.
            The keys should be OSM tags (e.g. `building`, `amenity`).
            The values should either be `True` for retrieving all objects with the tag,
            string for retrieving a single tag-value pair
            or list of strings for retrieving all values specified in the list.
            `tags={'leisure': 'park}` would return parks from the area.
            `tags={'leisure': 'park, 'amenity': True, 'shop': ['bakery', 'bicycle']}`
            would return parks, all amenity types, bakeries and bicycle shops.
            If `None`, handler will allow all of the tags to be parsed. Defaults to `None`.
        geometry_filter (BaseGeometry, optional): Region which can be used to filter only
            intersecting OSM objects. Defaults to `None`.
        working_directory (Union[str, Path], optional): Directory where to save
            the parsed `*.parquet` files. Defaults to "files".
        osm_way_polygon_features_config (Union[OsmWayPolygonConfig, dict[str, Any]], optional):
            Config used to determine which closed way features are polygons.
            Modifications to this config left are left for experienced OSM users.
            Defaults to predefined "osm_way_polygon_features.json".
    """
    self.tags_filter = tags_filter
    self.merged_tags_filter = merge_osm_tags_filter(tags_filter) if tags_filter else None
    self.geometry_filter = geometry_filter
    self.working_directory = Path(working_directory)
    self.working_directory.mkdir(parents=True, exist_ok=True)
    self.connection: duckdb.DuckDBPyConnection = None
    self.rows_per_bucket = 1_000_000
    if osm_way_polygon_features_config is None:
        # Config based on two sources + manual OSM wiki check
        # 1. https://github.com/tyrasd/osm-polygon-features/blob/v0.9.2/polygon-features.json
        # 2. https://github.com/ideditor/id-area-keys/blob/v5.0.1/areaKeys.json
        osm_way_polygon_features_config = json.loads(
            (Path(__file__).parent / "osm_way_polygon_features.json").read_text()
        )

    self.osm_way_polygon_features_config: OsmWayPolygonConfig = (
        osm_way_polygon_features_config
        if isinstance(osm_way_polygon_features_config, OsmWayPolygonConfig)
        else parse_dict_to_config_object(osm_way_polygon_features_config)
    )

ConvertedOSMParquetFiles

¶

Bases: NamedTuple

List of parquet files read from the *.osm.pbf file.

ParsedOSMFeatures

¶

Bases: NamedTuple

Final list of parsed features from the *.osm.pbf file.

get_features_gdf(
    file_paths,
    explode_tags=None,
    ignore_cache=False,
    filter_osm_ids=None,
)

¶

Get features GeoDataFrame from a list of PBF files.

Function parses multiple PBF files and returns a single GeoDataFrame with parsed OSM objects.

PARAMETER	DESCRIPTION
`file_paths`	Path or list of paths of `.osm.pbf` files to be parsed. TYPE:* `Union[str, Path, Iterable[Union[str, Path]]]`
`explode_tags`	Whether to split tags into columns based on OSM tag keys. If `None`, will be set based on `tags_filter` parameter. If no tags filter is provided, then `explode_tags` will set to `False`, if there is tags filter it will set to `True`. Defaults to `None`. TYPE: `bool` DEFAULT: `None`
`ignore_cache`	(bool, optional): Whether to ignore precalculated geoparquet files or not. Defaults to False. TYPE: `bool` DEFAULT: `False`
`filter_osm_ids`	(list[str], optional): List of OSM features ids to read from the file. Have to be in the form of 'node/', 'way/' or 'relation/'. Defaults to an empty list. TYPE: `Optional[list[str]]` DEFAULT: `None`

RETURNS	DESCRIPTION
`GeoDataFrame`	gpd.GeoDataFrame: GeoDataFrame with OSM features.

Source code in quackosm/pbf_file_reader.py

def get_features_gdf(
    self,
    file_paths: Union[str, Path, Iterable[Union[str, Path]]],
    explode_tags: Optional[bool] = None,
    ignore_cache: bool = False,
    filter_osm_ids: Optional[list[str]] = None,
) -> gpd.GeoDataFrame:
    """
    Get features GeoDataFrame from a list of PBF files.

    Function parses multiple PBF files and returns a single GeoDataFrame with parsed
    OSM objects.

    Args:
        file_paths (Union[str, Path, Iterable[Union[str, Path]]]):
            Path or list of paths of `*.osm.pbf` files to be parsed.
        explode_tags (bool, optional): Whether to split tags into columns based on OSM tag keys.
            If `None`, will be set based on `tags_filter` parameter.
            If no tags filter is provided, then `explode_tags` will set to `False`,
            if there is tags filter it will set to `True`. Defaults to `None`.
        ignore_cache: (bool, optional): Whether to ignore precalculated geoparquet files or not.
            Defaults to False.
        filter_osm_ids: (list[str], optional): List of OSM features ids to read from the file.
            Have to be in the form of 'node/<id>', 'way/<id>' or 'relation/<id>'.
            Defaults to an empty list.

    Returns:
        gpd.GeoDataFrame: GeoDataFrame with OSM features.
    """
    if isinstance(file_paths, (str, Path)):
        file_paths = [file_paths]

    if filter_osm_ids is None:
        filter_osm_ids = []

    if explode_tags is None:
        explode_tags = self.tags_filter is not None

    parsed_geoparquet_files = []
    for file_path in file_paths:
        parsed_geoparquet_file = self.convert_pbf_to_gpq(
            file_path,
            explode_tags=explode_tags,
            ignore_cache=ignore_cache,
            filter_osm_ids=filter_osm_ids,
        )
        parsed_geoparquet_files.append(parsed_geoparquet_file)

    parquet_tables = [
        io.read_geoparquet_table(parsed_parquet_file)  # type: ignore
        for parsed_parquet_file in parsed_geoparquet_files
    ]
    joined_parquet_table: pa.Table = pa.concat_tables(parquet_tables)
    gdf_parquet = gpd.GeoDataFrame(
        data=joined_parquet_table.drop(GEOMETRY_COLUMN).to_pandas(maps_as_pydicts="strict"),
        geometry=ga.to_geopandas(joined_parquet_table.column(GEOMETRY_COLUMN)),
    ).set_index(FEATURES_INDEX)

    return gdf_parquet

convert_pbf_to_gpq(
    pbf_path,
    result_file_path=None,
    explode_tags=None,
    ignore_cache=False,
    filter_osm_ids=None,
)

¶

Convert PBF file to GeoParquet file.

PARAMETER	DESCRIPTION
`pbf_path`	Pbf file to be parsed to GeoParquet. TYPE: `Union[str, Path]`
`result_file_path`	Where to save the geoparquet file. If not provided, will be generated based on hashes from provided tags filter and geometry filter. Defaults to `None`. TYPE: `Union[str, Path]` DEFAULT: `None`
`explode_tags`	Whether to split tags into columns based on OSM tag keys. If `None`, will be set based on `tags_filter` parameter. If no tags filter is provided, then `explode_tags` will set to `False`, if there is tags filter it will set to `True`. Defaults to `None`. TYPE: `bool` DEFAULT: `None`
`ignore_cache`	Whether to ignore precalculated geoparquet files or not. Defaults to False. TYPE: `bool` DEFAULT: `False`
`filter_osm_ids`	(list[str], optional): List of OSM features ids to read from the file. Have to be in the form of 'node/', 'way/' or 'relation/'. Defaults to an empty list. TYPE: `Optional[list[str]]` DEFAULT: `None`

RETURNS	DESCRIPTION
`Path`	Path to the generated GeoParquet file. TYPE: `Path`

Source code in quackosm/pbf_file_reader.py

def convert_pbf_to_gpq(
    self,
    pbf_path: Union[str, Path],
    result_file_path: Optional[Union[str, Path]] = None,
    explode_tags: Optional[bool] = None,
    ignore_cache: bool = False,
    filter_osm_ids: Optional[list[str]] = None,
) -> Path:
    """
    Convert PBF file to GeoParquet file.

    Args:
        pbf_path (Union[str, Path]): Pbf file to be parsed to GeoParquet.
        result_file_path (Union[str, Path], optional): Where to save
            the geoparquet file. If not provided, will be generated based on hashes
            from provided tags filter and geometry filter. Defaults to `None`.
        explode_tags (bool, optional): Whether to split tags into columns based on OSM tag keys.
            If `None`, will be set based on `tags_filter` parameter.
            If no tags filter is provided, then `explode_tags` will set to `False`,
            if there is tags filter it will set to `True`. Defaults to `None`.
        ignore_cache (bool, optional): Whether to ignore precalculated geoparquet files or not.
            Defaults to False.
        filter_osm_ids: (list[str], optional): List of OSM features ids to read from the file.
            Have to be in the form of 'node/<id>', 'way/<id>' or 'relation/<id>'.
            Defaults to an empty list.

    Returns:
        Path: Path to the generated GeoParquet file.
    """
    if filter_osm_ids is None:
        filter_osm_ids = []

    if explode_tags is None:
        explode_tags = self.tags_filter is not None

    with tempfile.TemporaryDirectory(dir=self.working_directory.resolve()) as tmp_dir_name:
        try:
            self._set_up_duckdb_connection(tmp_dir_name)
            result_file_path = result_file_path or self._generate_geoparquet_result_file_path(
                pbf_path,
                filter_osm_ids=filter_osm_ids,
                explode_tags=explode_tags,
            )
            parsed_geoparquet_file = self._parse_pbf_file(
                pbf_path=pbf_path,
                tmp_dir_name=tmp_dir_name,
                result_file_path=Path(result_file_path),
                filter_osm_ids=filter_osm_ids,
                explode_tags=explode_tags,
                ignore_cache=ignore_cache,
            )
            return parsed_geoparquet_file
        finally:
            if self.connection is not None:
                self.connection.close()
                self.connection = None

PbfFileReader¶

`quackosm.pbf_file_reader.PbfFileReader( tags_filter=None, geometry_filter=None, working_directory="files", osm_way_polygon_features_config=None, )`
¶

`ConvertedOSMParquetFiles`
¶

`ParsedOSMFeatures`
¶

`get_features_gdf( file_paths, explode_tags=None, ignore_cache=False, filter_osm_ids=None, )`
¶

`convert_pbf_to_gpq( pbf_path, result_file_path=None, explode_tags=None, ignore_cache=False, filter_osm_ids=None, )`
¶

PbfFileReader¶

quackosm.pbf_file_reader.PbfFileReader( tags_filter=None, geometry_filter=None, working_directory="files", osm_way_polygon_features_config=None, ) ¶

ConvertedOSMParquetFiles ¶

ParsedOSMFeatures ¶

get_features_gdf( file_paths, explode_tags=None, ignore_cache=False, filter_osm_ids=None, ) ¶

convert_pbf_to_gpq( pbf_path, result_file_path=None, explode_tags=None, ignore_cache=False, filter_osm_ids=None, ) ¶

`quackosm.pbf_file_reader.PbfFileReader( tags_filter=None, geometry_filter=None, working_directory="files", osm_way_polygon_features_config=None, )`
¶

`ConvertedOSMParquetFiles`
¶

`ParsedOSMFeatures`
¶

`get_features_gdf( file_paths, explode_tags=None, ignore_cache=False, filter_osm_ids=None, )`
¶

`convert_pbf_to_gpq( pbf_path, result_file_path=None, explode_tags=None, ignore_cache=False, filter_osm_ids=None, )`
¶