Skip to content

AdministrativeBoundaryRegionalizer

srai.regionalizers.AdministrativeBoundaryRegionalizer(
    admin_level,
    clip_regions=True,
    return_empty_region=False,
    prioritize_english_name=True,
    toposimplify=True,
    remove_artefact_regions=True,
)

Bases: Regionalizer

AdministrativeBoundaryRegionalizer.

Administrative boundary regionalizer allows the given geometries to be divided into boundaries from OpenStreetMap on a given admin_level [1].

Downloads those boundaries online using overpass and osmnx libraries.

Note: offline .pbf loading will be implemented in the future. Note: option to download historic data will be implemented in the future.

References
  1. https://wiki.openstreetmap.org/wiki/Key:admin_level
PARAMETER DESCRIPTION
admin_level

OpenStreetMap admin_level value. See [1] for detailed description of available values.

TYPE: int

clip_regions

Whether to clip regions using a provided mask. Turning it off can an be useful when trying to load regions using list a of points. Defaults to True.

TYPE: bool DEFAULT: True

return_empty_region

Whether to return an empty region to fill remaining space or not. Defaults to False.

TYPE: bool DEFAULT: False

prioritize_english_name

Whether to use english area name if available as a region id first. Defaults to True.

TYPE: bool DEFAULT: True

toposimplify

Whether to simplify topology to reduce geometries size or not. Value is passed to topojson library for topology-aware simplification. Since provided values are treated like degrees, values between 1e-4 and 1.0 are recommended. Defaults to True (which results in value equal 1e-4).

TYPE: Union[bool, float] DEFAULT: True

remove_artefact_regions

Whether to remove small regions barely intersecting queried area. Turning it off can sometimes load unnecessary boundaries that touch on the edge. It removes regions that intersect with an area smaller than 1% of total self. Takes into consideration if provided query GeoDataFrame contains points and then skips calculating area when intersects any point. Defaults to True.

TYPE: bool DEFAULT: True

RAISES DESCRIPTION
ValueError

If admin_level is outside available range (1-11). See [2] for list of values with in_wiki selected.

References
  1. https://wiki.openstreetmap.org/wiki/Tag:boundary=administrative#10_admin_level_values_for_specific_countries
  2. https://taginfo.openstreetmap.org/keys/admin_level#values
Source code in srai/regionalizers/administrative_boundary_regionalizer.py
def __init__(
    self,
    admin_level: int,
    clip_regions: bool = True,
    return_empty_region: bool = False,
    prioritize_english_name: bool = True,
    toposimplify: Union[bool, float] = True,
    remove_artefact_regions: bool = True,
) -> None:
    """
    Init AdministrativeBoundaryRegionalizer.

    Args:
        admin_level (int): OpenStreetMap admin_level value. See [1] for detailed description of
            available values.
        clip_regions (bool, optional): Whether to clip regions using a provided mask.
            Turning it off can an be useful when trying to load regions using list a of points.
            Defaults to True.
        return_empty_region (bool, optional): Whether to return an empty region to fill
            remaining space or not. Defaults to False.
        prioritize_english_name (bool, optional): Whether to use english area name if available
            as a region id first. Defaults to True.
        toposimplify (Union[bool, float], optional): Whether to simplify topology to reduce
            geometries size or not. Value is passed to `topojson` library for topology-aware
            simplification. Since provided values are treated like degrees, values between
            1e-4 and 1.0 are recommended. Defaults to True (which results in value equal 1e-4).
        remove_artefact_regions (bool, optional): Whether to remove small regions barely
            intersecting queried area. Turning it off can sometimes load unnecessary boundaries
            that touch on the edge. It removes regions that intersect with an area smaller
            than 1% of total self. Takes into consideration if provided query GeoDataFrame
            contains points and then skips calculating area when intersects any point.
            Defaults to True.

    Raises:
        ValueError: If admin_level is outside available range (1-11). See [2] for list of
            values with `in_wiki` selected.

    References:
        1. https://wiki.openstreetmap.org/wiki/Tag:boundary=administrative#10_admin_level_values_for_specific_countries
        2. https://taginfo.openstreetmap.org/keys/admin_level#values
    """  # noqa: W505, E501
    import_optional_dependencies(
        dependency_group="osm",
        modules=["osmnx", "overpass"],
    )
    from overpass import API

    if admin_level < 1 or admin_level > 11:
        raise ValueError("admin_level outside of available range.")

    self.admin_level = admin_level
    self.prioritize_english_name = prioritize_english_name
    self.clip_regions = clip_regions
    self.return_empty_region = return_empty_region
    self.remove_artefact_regions = remove_artefact_regions

    if isinstance(toposimplify, (int, float)) and toposimplify > 0:
        self.toposimplify = toposimplify
    elif isinstance(toposimplify, bool) and toposimplify:
        self.toposimplify = 1e-4
    else:
        self.toposimplify = False

    self.overpass_api = API(timeout=60)

transform(gdf)

Regionalize a given GeoDataFrame.

Will query Overpass [1] server using overpass [2] library for closed administrative boundaries on a given admin_level and then download geometries for each relation using osmnx [3] library.

If prioritize_english_name is set to True, method will try to extract the name:en tag first before resorting to the name tag. If boundary doesn't have a name tag, an id will be used.

Before returning downloaded regions, a topology is built and can be simplified to reduce size of geometries while keeping neighbouring regions together without introducing gaps. Topojson library [4] is used for this operation.

Additionally, an empty region with name EMPTY can be introduced if returned regions do not fully cover a clipping mask.

PARAMETER DESCRIPTION
gdf

GeoDataFrame to be regionalized. Will use this list of geometries to crop resulting regions.

TYPE: gpd.GeoDataFrame

RETURNS DESCRIPTION
gpd.GeoDataFrame

gpd.GeoDataFrame: GeoDataFrame with the regionalized data cropped using input GeoDataFrame.

RAISES DESCRIPTION
RuntimeError

If simplification can't preserve a topology.

References
  1. https://wiki.openstreetmap.org/wiki/Overpass_API
  2. https://github.com/mvexel/overpass-api-python-wrapper
  3. https://github.com/gboeing/osmnx
  4. https://github.com/mattijn/topojson
Source code in srai/regionalizers/administrative_boundary_regionalizer.py
def transform(self, gdf: gpd.GeoDataFrame) -> gpd.GeoDataFrame:
    """
    Regionalize a given GeoDataFrame.

    Will query Overpass [1] server using `overpass` [2] library for closed administrative
    boundaries on a given admin_level and then download geometries for each relation using
    `osmnx` [3] library.

    If `prioritize_english_name` is set to `True`, method will try to extract the `name:en` tag
    first before resorting to the `name` tag. If boundary doesn't have a `name` tag, an `id`
    will be used.

    Before returning downloaded regions, a topology is built and can be simplified to reduce
    size of geometries while keeping neighbouring regions together without introducing gaps.
    `Topojson` library [4] is used for this operation.

    Additionally, an empty region with name `EMPTY` can be introduced if returned regions do
    not fully cover a clipping mask.

    Args:
        gdf (gpd.GeoDataFrame): GeoDataFrame to be regionalized.
            Will use this list of geometries to crop resulting regions.

    Returns:
        gpd.GeoDataFrame: GeoDataFrame with the regionalized data cropped using input
            GeoDataFrame.

    Raises:
        RuntimeError: If simplification can't preserve a topology.

    References:
        1. https://wiki.openstreetmap.org/wiki/Overpass_API
        2. https://github.com/mvexel/overpass-api-python-wrapper
        3. https://github.com/gboeing/osmnx
        4. https://github.com/mattijn/topojson
    """
    gdf_wgs84 = gdf.to_crs(crs=WGS84_CRS)

    regions_dicts = self._generate_regions_from_all_geometries(gdf_wgs84)

    if not regions_dicts:
        import warnings

        warnings.warn(
            "Couldn't find any administrative boundaries with"
            f" `admin_level`={self.admin_level}.",
            RuntimeWarning,
            stacklevel=2,
        )

        return self._get_empty_geodataframe(gdf_wgs84)

    regions_gdf = gpd.GeoDataFrame(data=regions_dicts, crs=WGS84_CRS).set_index(REGIONS_INDEX)
    regions_gdf = self._toposimplify_gdf(regions_gdf)

    if self.remove_artefact_regions:
        points_collection: Optional[BaseGeometry] = gdf_wgs84[
            gdf_wgs84.geom_type == "Point"
        ].geometry.unary_union
        clipping_polygon_area: Optional[BaseGeometry] = gdf_wgs84[
            gdf_wgs84.geom_type != "Point"
        ].geometry.unary_union

        regions_to_keep = [
            region_id
            for region_id, row in regions_gdf.iterrows()
            if self._check_intersects_with_points(row["geometry"], points_collection)
            or self._calculate_intersection_area_fraction(
                row["geometry"], clipping_polygon_area
            )
            > 0.01
        ]
        regions_gdf = regions_gdf.loc[regions_to_keep]

    if self.clip_regions:
        regions_gdf = regions_gdf.clip(mask=gdf_wgs84, keep_geom_type=False)

    if self.return_empty_region:
        empty_region = self._generate_empty_region(mask=gdf_wgs84, regions_gdf=regions_gdf)
        if not empty_region.is_empty:
            regions_gdf.loc[
                AdministrativeBoundaryRegionalizer.EMPTY_REGION_NAME, GEOMETRY_COLUMN
            ] = empty_region

    return regions_gdf