Skip to content

Index

OSM Loaders.

OSMLoader

Bases: Loader, ABC

Abstract class for loaders.

load(area, tags)

abstractmethod

Load data for a given area.

PARAMETER DESCRIPTION
area

Shapely geometry with the area of interest.

TYPE: Union[BaseGeometry, Iterable[BaseGeometry], GeoSeries, GeoDataFrame]

tags

OSM tags filter.

TYPE: Union[OsmTagsFilter, GroupedOsmTagsFilter]

RETURNS DESCRIPTION
GeoDataFrame

gpd.GeoDataFrame: GeoDataFrame with the downloaded data.

Source code in srai/loaders/osm_loaders/_base.py
@abc.abstractmethod
def load(
    self,
    area: Union[BaseGeometry, Iterable[BaseGeometry], gpd.GeoSeries, gpd.GeoDataFrame],
    tags: Union[OsmTagsFilter, GroupedOsmTagsFilter],
) -> gpd.GeoDataFrame:  # pragma: no cover
    """
    Load data for a given area.

    Args:
        area (Union[BaseGeometry, Iterable[BaseGeometry], gpd.GeoSeries, gpd.GeoDataFrame]):
            Shapely geometry with the area of interest.
        tags (Union[OsmTagsFilter, GroupedOsmTagsFilter]): OSM tags filter.

    Returns:
        gpd.GeoDataFrame: GeoDataFrame with the downloaded data.
    """
    raise NotImplementedError

OSMOnlineLoader()

Bases: OSMLoader

OSMOnlineLoader.

OSM(OpenStreetMap)[1] online loader is a loader capable of downloading objects from a given area from OSM. It filters features based on OSM tags[2] in form of key:value pairs, that are used by OSM users to give meaning to geometries.

This loader is a wrapper around the osmnx library. It uses osmnx.geometries_from_polygon to make individual queries.

References
  1. https://www.openstreetmap.org/
  2. https://wiki.openstreetmap.org/wiki/Tags
Source code in srai/loaders/osm_loaders/osm_online_loader.py
def __init__(self) -> None:
    """Initialize OSMOnlineLoader."""
    import_optional_dependencies(dependency_group="osm", modules=["osmnx"])

load(area, tags)

Download OSM features with specified tags for a given area.

The loader first downloads all objects with tags. It returns a GeoDataFrame containing the geometry column and columns for tag keys.

Some key/value pairs might be missing from the resulting GeoDataFrame,

simply because there are no such objects in the given area.

PARAMETER DESCRIPTION
area

Area for which to download objects.

TYPE: Union[BaseGeometry, Iterable[BaseGeometry], GeoSeries, GeoDataFrame]

tags

A dictionary specifying which tags to download. The keys should be OSM tags (e.g. building, amenity). The values should either be True for retrieving all objects with the tag, string for retrieving a single tag-value pair or list of strings for retrieving all values specified in the list. tags={'leisure': 'park} would return parks from the area. tags={'leisure': 'park, 'amenity': True, 'shop': ['bakery', 'bicycle']} would return parks, all amenity types, bakeries and bicycle shops.

TYPE: Union[OsmTagsFilter, GroupedOsmTagsFilter]

RETURNS DESCRIPTION
GeoDataFrame

gpd.GeoDataFrame: Downloaded features as a GeoDataFrame.

Source code in srai/loaders/osm_loaders/osm_online_loader.py
def load(
    self,
    area: Union[BaseGeometry, Iterable[BaseGeometry], gpd.GeoSeries, gpd.GeoDataFrame],
    tags: Union[OsmTagsFilter, GroupedOsmTagsFilter],
) -> gpd.GeoDataFrame:
    """
    Download OSM features with specified tags for a given area.

    The loader first downloads all objects with `tags`. It returns a GeoDataFrame containing
    the `geometry` column and columns for tag keys.

    Note: Some key/value pairs might be missing from the resulting GeoDataFrame,
        simply because there are no such objects in the given area.

    Args:
        area (Union[BaseGeometry, Iterable[BaseGeometry], gpd.GeoSeries, gpd.GeoDataFrame]):
            Area for which to download objects.
        tags (Union[OsmTagsFilter, GroupedOsmTagsFilter]): A dictionary
            specifying which tags to download.
            The keys should be OSM tags (e.g. `building`, `amenity`).
            The values should either be `True` for retrieving all objects with the tag,
            string for retrieving a single tag-value pair
            or list of strings for retrieving all values specified in the list.
            `tags={'leisure': 'park}` would return parks from the area.
            `tags={'leisure': 'park, 'amenity': True, 'shop': ['bakery', 'bicycle']}`
            would return parks, all amenity types, bakeries and bicycle shops.

    Returns:
        gpd.GeoDataFrame: Downloaded features as a GeoDataFrame.
    """
    import osmnx as ox

    area_wgs84 = self._prepare_area_gdf(area)

    merged_tags = merge_osm_tags_filter(tags)

    _tags = self._flatten_tags(merged_tags)

    total_tags_num = len(_tags)
    total_queries = len(area_wgs84) * total_tags_num

    key_value_name_max_len = self._get_max_key_value_name_len(_tags)
    desc_max_len = key_value_name_max_len + len(self._PBAR_FORMAT.format("", ""))

    results = []

    osmnx_new_api = version.parse(ox.__version__) >= version.parse("1.5.0")
    osmnx_download_function = (
        ox.features_from_polygon if osmnx_new_api else ox.geometries_from_polygon
    )

    pbar = tqdm(product(area_wgs84[GEOMETRY_COLUMN], _tags), total=total_queries)
    for polygon, (key, value) in pbar:
        pbar.set_description(self._get_pbar_desc(key, value, desc_max_len))
        geometries = osmnx_download_function(polygon, {key: value})
        if not geometries.empty:
            results.append(geometries[[GEOMETRY_COLUMN, key]])

    result_gdf = self._group_gdfs(results).set_crs(WGS84_CRS)
    result_gdf = self._flatten_index(result_gdf)

    return self._parse_features_gdf_to_groups(result_gdf, tags)

OSMPbfLoader(
    pbf_file=None,
    download_source="geofabrik",
    download_directory="files",
)

Bases: OSMLoader

OSMPbfLoader.

OSM(OpenStreetMap)[1] PBF(Protocolbuffer Binary Format)[2] loader is a loader capable of loading OSM features from a PBF file. It filters features based on OSM tags[3] in form of key:value pairs, that are used by OSM users to give meaning to geometries.

This loader uses PbfFileReader from the QuackOSM[3] library. It utilizes the duckdb[4] engine with spatial[5] extension capable of parsing an *.osm.pbf file.

Additionally, it can download a pbf file extract for a given area using different sources.

References
  1. https://www.openstreetmap.org/
  2. https://wiki.openstreetmap.org/wiki/PBF_Format
  3. https://github.com/kraina-ai/quackosm
  4. https://duckdb.org/
  5. https://github.com/duckdb/duckdb_spatial
PARAMETER DESCRIPTION
pbf_file

Downloaded *.osm.pbf file to be used by the loader. If not provided, it will be automatically downloaded for a given area. Defaults to None.

TYPE: Union[str, Path] DEFAULT: None

download_source

Source to use when downloading PBF files. Can be one of: any, geofabrik, osmfr, bbbike. Defaults to "any".

TYPE: OsmExtractSource DEFAULT: 'geofabrik'

download_directory

Directory where to save the downloaded *.osm.pbf files. Ignored if pbf_file is provided. Defaults to "files".

TYPE: Union[str, Path] DEFAULT: 'files'

Source code in srai/loaders/osm_loaders/osm_pbf_loader.py
def __init__(
    self,
    pbf_file: Optional[Union[str, Path]] = None,
    download_source: "OsmExtractSource" = "geofabrik",
    download_directory: Union[str, Path] = "files",
) -> None:
    """
    Initialize OSMPbfLoader.

    Args:
        pbf_file (Union[str, Path], optional): Downloaded `*.osm.pbf` file to be used by
            the loader. If not provided, it will be automatically downloaded for a given area.
            Defaults to None.
        download_source (OsmExtractSource, optional): Source to use when downloading PBF files.
            Can be one of: `any`, `geofabrik`, `osmfr`, `bbbike`.
            Defaults to "any".
        download_directory (Union[str, Path], optional): Directory where to save the downloaded
            `*.osm.pbf` files. Ignored if `pbf_file` is provided. Defaults to "files".
    """
    import_optional_dependencies(dependency_group="osm", modules=["quackosm"])
    self.pbf_file = pbf_file
    self.download_source = download_source
    self.download_directory = download_directory

load(
    area,
    tags,
    ignore_cache=False,
    explode_tags=True,
    keep_all_tags=False,
)

Load OSM features with specified tags for a given area from an *.osm.pbf file.

The loader will use provided *.osm.pbf file, or download extracts automatically. Later it will parse and filter features from files using PbfFileReader from QuackOSM library. It will return a GeoDataFrame containing the geometry column and columns for tag keys.

Some key/value pairs might be missing from the resulting GeoDataFrame,

simply because there are no such objects in the given area.

PARAMETER DESCRIPTION
area

Area for which to download objects.

TYPE: Union[BaseGeometry, Iterable[BaseGeometry], GeoSeries, GeoDataFrame]

tags

A dictionary specifying which tags to download. The keys should be OSM tags (e.g. building, amenity). The values should either be True for retrieving all objects with the tag, string for retrieving a single tag-value pair or list of strings for retrieving all values specified in the list. tags={'leisure': 'park} would return parks from the area. tags={'leisure': 'park, 'amenity': True, 'shop': ['bakery', 'bicycle']} would return parks, all amenity types, bakeries and bicycle shops.

TYPE: Union[OsmTagsFilter, GroupedOsmTagsFilter]

ignore_cache

(bool, optional): Whether to ignore precalculated geoparquet files or not. Defaults to False.

TYPE: bool DEFAULT: False

explode_tags

(bool, optional): Whether to split OSM tags into multiple columns or keep them in a single dict. Defaults to True.

TYPE: bool DEFAULT: True

keep_all_tags

(bool, optional): Whether to keep all tags related to the element, or return only those defined in the tags_filter. When True, will override the optional grouping defined in the tags_filter. Defaults to False.

TYPE: bool DEFAULT: False

RAISES DESCRIPTION
ValueError

If PBF file is expected to be downloaded and provided geometries aren't shapely.geometry.Polygons.

RETURNS DESCRIPTION
GeoDataFrame

gpd.GeoDataFrame: Downloaded features as a GeoDataFrame.

Source code in srai/loaders/osm_loaders/osm_pbf_loader.py
def load(
    self,
    area: Union[BaseGeometry, Iterable[BaseGeometry], gpd.GeoSeries, gpd.GeoDataFrame],
    tags: Union[OsmTagsFilter, GroupedOsmTagsFilter],
    ignore_cache: bool = False,
    explode_tags: bool = True,
    keep_all_tags: bool = False,
) -> gpd.GeoDataFrame:
    """
    Load OSM features with specified tags for a given area from an `*.osm.pbf` file.

    The loader will use provided `*.osm.pbf` file, or download extracts
    automatically. Later it will parse and filter features from files
    using `PbfFileReader` from `QuackOSM` library. It will return a GeoDataFrame
    containing the `geometry` column and columns for tag keys.

    Note: Some key/value pairs might be missing from the resulting GeoDataFrame,
        simply because there are no such objects in the given area.

    Args:
        area (Union[BaseGeometry, Iterable[BaseGeometry], gpd.GeoSeries, gpd.GeoDataFrame]):
            Area for which to download objects.
        tags (Union[OsmTagsFilter, GroupedOsmTagsFilter]): A dictionary
            specifying which tags to download.
            The keys should be OSM tags (e.g. `building`, `amenity`).
            The values should either be `True` for retrieving all objects with the tag,
            string for retrieving a single tag-value pair
            or list of strings for retrieving all values specified in the list.
            `tags={'leisure': 'park}` would return parks from the area.
            `tags={'leisure': 'park, 'amenity': True, 'shop': ['bakery', 'bicycle']}`
            would return parks, all amenity types, bakeries and bicycle shops.
        ignore_cache: (bool, optional): Whether to ignore precalculated geoparquet files or not.
            Defaults to False.
        explode_tags: (bool, optional): Whether to split OSM tags into multiple columns or keep
            them in a single dict. Defaults to True.
        keep_all_tags: (bool, optional): Whether to keep all tags related to the element,
            or return only those defined in the `tags_filter`. When True, will override
            the optional grouping defined in the `tags_filter`. Defaults to False.

    Raises:
        ValueError: If PBF file is expected to be downloaded and provided geometries
            aren't shapely.geometry.Polygons.

    Returns:
        gpd.GeoDataFrame: Downloaded features as a GeoDataFrame.
    """
    area_wgs84 = self._prepare_area_gdf(area)

    pbf_reader = self._get_pbf_file_reader(area_wgs84, tags)

    if self.pbf_file is not None:
        features_gdf = pbf_reader.get_features_gdf(
            file_paths=self.pbf_file,
            keep_all_tags=keep_all_tags,
            explode_tags=explode_tags,
            ignore_cache=ignore_cache,
        )
    else:
        features_gdf = pbf_reader.get_features_gdf_from_geometry(
            keep_all_tags=keep_all_tags, explode_tags=explode_tags, ignore_cache=ignore_cache
        )

    features_gdf = features_gdf.set_crs(WGS84_CRS)

    features_columns = [
        column
        for column in features_gdf.columns
        if column != GEOMETRY_COLUMN and features_gdf[column].notnull().any()
    ]
    features_gdf = features_gdf[[GEOMETRY_COLUMN, *sorted(features_columns)]]

    return features_gdf

load_to_geoparquet(
    area,
    tags,
    ignore_cache=False,
    explode_tags=True,
    keep_all_tags=False,
)

Load OSM features with specified tags for a given area and save it to geoparquet file.

PARAMETER DESCRIPTION
area

Area for which to download objects.

TYPE: Union[BaseGeometry, Iterable[BaseGeometry], GeoSeries, GeoDataFrame]

tags

A dictionary specifying which tags to download. The keys should be OSM tags (e.g. building, amenity). The values should either be True for retrieving all objects with the tag, string for retrieving a single tag-value pair or list of strings for retrieving all values specified in the list. tags={'leisure': 'park} would return parks from the area. tags={'leisure': 'park, 'amenity': True, 'shop': ['bakery', 'bicycle']} would return parks, all amenity types, bakeries and bicycle shops.

TYPE: Union[OsmTagsFilter, GroupedOsmTagsFilter]

ignore_cache

(bool, optional): Whether to ignore precalculated geoparquet files or not. Defaults to False.

TYPE: bool DEFAULT: False

explode_tags

(bool, optional): Whether to split OSM tags into multiple columns or keep them in a single dict. Defaults to True.

TYPE: bool DEFAULT: True

keep_all_tags

(bool, optional): Whether to keep all tags related to the element, or return only those defined in the tags_filter. When True, will override the optional grouping defined in the tags_filter. Defaults to False.

TYPE: bool DEFAULT: False

RETURNS DESCRIPTION
Path

Path to the saved GeoParquet file.

TYPE: Path

Source code in srai/loaders/osm_loaders/osm_pbf_loader.py
def load_to_geoparquet(
    self,
    area: Union[BaseGeometry, Iterable[BaseGeometry], gpd.GeoSeries, gpd.GeoDataFrame],
    tags: Union[OsmTagsFilter, GroupedOsmTagsFilter],
    ignore_cache: bool = False,
    explode_tags: bool = True,
    keep_all_tags: bool = False,
) -> Path:
    """
    Load OSM features with specified tags for a given area and save it to geoparquet file.

    Args:
        area (Union[BaseGeometry, Iterable[BaseGeometry], gpd.GeoSeries, gpd.GeoDataFrame]):
            Area for which to download objects.
        tags (Union[OsmTagsFilter, GroupedOsmTagsFilter]): A dictionary
            specifying which tags to download.
            The keys should be OSM tags (e.g. `building`, `amenity`).
            The values should either be `True` for retrieving all objects with the tag,
            string for retrieving a single tag-value pair
            or list of strings for retrieving all values specified in the list.
            `tags={'leisure': 'park}` would return parks from the area.
            `tags={'leisure': 'park, 'amenity': True, 'shop': ['bakery', 'bicycle']}`
            would return parks, all amenity types, bakeries and bicycle shops.
        ignore_cache: (bool, optional): Whether to ignore precalculated geoparquet files or not.
            Defaults to False.
        explode_tags: (bool, optional): Whether to split OSM tags into multiple columns or keep
            them in a single dict. Defaults to True.
        keep_all_tags: (bool, optional): Whether to keep all tags related to the element,
            or return only those defined in the `tags_filter`. When True, will override
            the optional grouping defined in the `tags_filter`. Defaults to False.

    Returns:
        Path: Path to the saved GeoParquet file.
    """
    area_wgs84 = self._prepare_area_gdf(area)

    pbf_reader = self._get_pbf_file_reader(area_wgs84, tags)

    geoparquet_file_path: Path

    if self.pbf_file is not None:
        geoparquet_file_path = pbf_reader.convert_pbf_to_gpq(
            pbf_path=self.pbf_file,
            keep_all_tags=keep_all_tags,
            explode_tags=explode_tags,
            ignore_cache=ignore_cache,
        )
    else:
        geoparquet_file_path = pbf_reader.convert_geometry_filter_to_gpq(
            keep_all_tags=keep_all_tags, explode_tags=explode_tags, ignore_cache=ignore_cache
        )

    return geoparquet_file_path

OSMTileLoader(
    tile_server_url,
    zoom,
    verbose=False,
    resource_type="png",
    auth_token=None,
    data_collector=None,
    storage_path=None,
)

OSM Tile Loader.

Download raster tiles from user specified tile server, like listed in [1]. Loader finds x, y coordinates [2] for specified area and downloads tiles. Address is built with schema {tile_server_url}/{zoom}/{x}/{y}.{resource_type}

References
  1. https://wiki.openstreetmap.org/wiki/Raster_tile_providers
  2. https://wiki.openstreetmap.org/wiki/Slippy_map_tilenames
PARAMETER DESCRIPTION
tile_server_url

url of tile server, without z, x, y parameters

TYPE: str

zoom

zoom level [1]

TYPE: int

verbose

should print logs. Defaults to False.

TYPE: bool DEFAULT: False

resource_type

file extension. Added to the end of url. Defaults to "png".

TYPE: str DEFAULT: 'png'

auth_token

auth token. Added as access_token parameter to request. Defaults to None.

TYPE: str DEFAULT: None

data_collector

DataCollector object or

TYPE: Union[str, DataCollector] DEFAULT: None

storage_path

path to save data, used with SavingDataCollector. Defaults to None.

TYPE: Union[str, Path] DEFAULT: None

References
  1. https://wiki.openstreetmap.org/wiki/Slippy_map_tilenames
Source code in srai/loaders/osm_loaders/osm_tile_loader.py
def __init__(
    self,
    tile_server_url: str,
    zoom: int,
    verbose: bool = False,
    resource_type: str = "png",
    auth_token: Optional[str] = None,
    data_collector: Optional[Union[str, DataCollector]] = None,
    storage_path: Optional[Union[str, Path]] = None,
) -> None:
    """
    Initialize TileLoader.

    Args:
        tile_server_url (str): url of tile server, without z, x, y parameters
        zoom (int): zoom level [1]
        verbose (bool, optional): should print logs. Defaults to False.
        resource_type (str, optional): file extension. Added to the end of url.
            Defaults to "png".
        auth_token (str, optional): auth token. Added as access_token parameter
            to request. Defaults to None.
        data_collector (Union[str, DataCollector], optional): DataCollector object or
        enum defining default collector. If None uses InMemoryDataCollector. Defaults to None.
        If `return` uses  InMemoryDataCollector
        If `save` uses  SavingDataCollector
        storage_path (Union[str, Path], optional): path to save data,
            used with SavingDataCollector. Defaults to None.

    References:
        1. https://wiki.openstreetmap.org/wiki/Slippy_map_tilenames
    """
    import_optional_dependencies(dependency_group="osm", modules=["PIL"])
    self.zoom = zoom
    self.verbose = verbose
    self.resource_type = resource_type
    self.base_url = urljoin(tile_server_url, "{0}/{1}/{2}." + resource_type)
    self.auth_token = auth_token
    self.save_path = storage_path
    self.data_collector = (
        self._get_collector(data_collector)
        if data_collector is not None
        else InMemoryDataCollector()
    )
    self.regionalizer = SlippyMapRegionalizer(zoom=self.zoom)

load(area)

Return all tiles of region.

PARAMETER DESCRIPTION
area

Area for which to download objects.

TYPE: Union[BaseGeometry, Iterable[BaseGeometry], GeoSeries, GeoDataFrame]

RETURNS DESCRIPTION
GeoDataFrame

gpd.GeoDataFrame: Pandas of tiles for each region in area transformed by DataCollector

Source code in srai/loaders/osm_loaders/osm_tile_loader.py
def load(
    self,
    area: Union[BaseGeometry, Iterable[BaseGeometry], gpd.GeoSeries, gpd.GeoDataFrame],
) -> gpd.GeoDataFrame:
    """
    Return all tiles of region.

    Args:
        area (Union[BaseGeometry, Iterable[BaseGeometry], gpd.GeoSeries, gpd.GeoDataFrame]):
            Area for which to download objects.

    Returns:
        gpd.GeoDataFrame: Pandas of tiles for each region in area transformed by DataCollector
    """
    area_wgs84 = prepare_area_gdf_for_loader(area)
    regions = self.regionalizer.transform(gdf=area_wgs84)
    regions["tile"] = regions.apply(self._get_tile_for_area, axis=1)
    return regions

get_tile_by_x_y(x, y, idx=None)

Download single tile from tile server. Return tile processed by DataCollector.

PARAMETER DESCRIPTION
x(int)

x tile coordinate

y(int)

y tile coordinate

idx

id of tile, if non created as x_y_self.zoom

TYPE: Any DEFAULT: None

Source code in srai/loaders/osm_loaders/osm_tile_loader.py
def get_tile_by_x_y(self, x: int, y: int, idx: Any = None) -> Any:
    """
    Download single tile from tile server. Return tile processed by DataCollector.

    Args:
        x(int): x tile coordinate
        y(int): y tile coordinate
        idx (Any): id of tile, if non created as x_y_self.zoom
    """
    from PIL import Image

    if idx is None:
        idx = f"{x}_{y}_{self.zoom}"
    url = self.base_url.format(self.zoom, x, y)
    if self.verbose:
        print(f"Getting tile from url: {url}")
    content = requests.get(url, params=dict(access_token=self.auth_token)).content
    tile = Image.open(BytesIO(content))
    return self.data_collector.store(idx, tile)