Skip to content

Advanced functions

overturemaestro.advanced_functions

Advanced functions.

This module contains dedicated functions for specific use cases.

convert_bounding_box_to_wide_form_geodataframe

convert_bounding_box_to_wide_form_geodataframe(
    theme: str,
    type: str,
    bbox: tuple[float, float, float, float],
    release: Optional[str] = None,
    *,
    include_all_possible_columns: bool = True,
    hierarchy_depth: Optional[int] = None,
    pyarrow_filters: Optional[PYARROW_FILTER] = None,
    ignore_cache: bool = False,
    working_directory: Union[str, Path] = "files",
    verbosity_mode: VERBOSITY_MODE = "transient",
    max_workers: Optional[int] = None,
    places_use_primary_category_only: bool = False,
    places_minimal_confidence: float = 0.75
) -> gpd.GeoDataFrame

Get GeoDataFrame for a given bounding box in a wide format.

Automatically downloads Overture Maps dataset for a given release and theme/type in a concurrent manner and returns a single GeoDataFrame as a result with multiple columns based on dataset schema.

PARAMETER DESCRIPTION
theme

Theme of the dataset.

TYPE: str

type

Type of the dataset.

TYPE: str

bbox

Bounding box used to filter data. Order of values: xmin, ymin, xmax, ymax.

TYPE: tuple[float, float, float, float]

release

Release version. If not provided, will automatically load newest available release version. Defaults to None.

TYPE: Optional[str] DEFAULT: None

include_all_possible_columns

Whether to have always the same list of columns in the resulting file. This ensures that always the same set of columns is returned for a given release for different regions. This also means, that some columns might be all filled with a False value. Defaults to True.

TYPE: bool DEFAULT: True

hierarchy_depth

Depth used to calculate how many hierarchy columns should be used to generate the wide form of the data. If None, will use all available columns. Defaults to None.

TYPE: Optional[int] DEFAULT: None

pyarrow_filters

A pyarrow expression used to filter specific theme type pair. Defaults to None.

TYPE: Optional[PYARROW_FILTER] DEFAULT: None

ignore_cache

Whether to ignore precalculated geoparquet files or not. Defaults to False.

TYPE: bool DEFAULT: False

working_directory

Directory where to save the downloaded *.parquet files. Defaults to "files".

TYPE: Union[str, Path] DEFAULT: 'files'

verbosity_mode

Set progress verbosity mode. Can be one of: silent, transient and verbose. Silent disables output completely. Transient tracks progress, but removes output after finished. Verbose leaves all progress outputs in the stdout. Defaults to "transient".

TYPE: Literal[silent, transient, verbose] DEFAULT: 'transient'

max_workers

Max number of multiprocessing workers used to process the dataset. Defaults to None.

TYPE: Optional[int] DEFAULT: None

places_use_primary_category_only

Whether to use only the primary category for places. Defaults to False.

TYPE: bool DEFAULT: False

places_minimal_confidence

Minimal confidence level for the places dataset. Defaults to 0.75.

TYPE: float DEFAULT: 0.75

RETURNS DESCRIPTION
GeoDataFrame

gpd.GeoDataFrame: GeoDataFrame with Overture Maps features.

Source code in overturemaestro/advanced_functions/functions.py
def convert_bounding_box_to_wide_form_geodataframe(
    theme: str,
    type: str,
    bbox: tuple[float, float, float, float],
    release: Optional[str] = None,
    *,
    include_all_possible_columns: bool = True,
    hierarchy_depth: Optional[int] = None,
    pyarrow_filters: Optional[PYARROW_FILTER] = None,
    ignore_cache: bool = False,
    working_directory: Union[str, Path] = "files",
    verbosity_mode: VERBOSITY_MODE = "transient",
    max_workers: Optional[int] = None,
    places_use_primary_category_only: bool = False,
    places_minimal_confidence: float = 0.75,
) -> gpd.GeoDataFrame:
    """
    Get GeoDataFrame for a given bounding box in a wide format.

    Automatically downloads Overture Maps dataset for a given release and theme/type
    in a concurrent manner and returns a single GeoDataFrame as a result with multiple columns based
    on dataset schema.

    Args:
        theme (str): Theme of the dataset.
        type (str): Type of the dataset.
        bbox (tuple[float, float, float, float]): Bounding box used to filter data.
            Order of values: xmin, ymin, xmax, ymax.
        release (Optional[str], optional): Release version. If not provided, will automatically load
            newest available release version. Defaults to None.
        include_all_possible_columns (bool, optional): Whether to have always the same list of
            columns in the resulting file. This ensures that always the same set of columns is
            returned for a given release for different regions. This also means, that some columns
            might be all filled with a False value. Defaults to True.
        hierarchy_depth (Optional[int], optional): Depth used to calculate how many hierarchy
            columns should be used to generate the wide form of the data. If None, will use all
            available columns. Defaults to None.
        pyarrow_filters (Optional[PYARROW_FILTER], optional): A pyarrow expression used to filter
            specific theme type pair. Defaults to None.
        ignore_cache (bool, optional): Whether to ignore precalculated geoparquet files or not.
            Defaults to False.
        working_directory (Union[str, Path], optional): Directory where to save
            the downloaded `*.parquet` files. Defaults to "files".
        verbosity_mode (Literal["silent", "transient", "verbose"], optional): Set progress
            verbosity mode. Can be one of: silent, transient and verbose. Silent disables
            output completely. Transient tracks progress, but removes output after finished.
            Verbose leaves all progress outputs in the stdout. Defaults to "transient".
        max_workers (Optional[int], optional): Max number of multiprocessing workers used to
            process the dataset. Defaults to None.
        places_use_primary_category_only (bool, optional): Whether to use only the primary category
            for places. Defaults to False.
        places_minimal_confidence (float, optional): Minimal confidence level for the places
            dataset. Defaults to 0.75.

    Returns:
        gpd.GeoDataFrame: GeoDataFrame with Overture Maps features.
    """
    return convert_geometry_to_wide_form_geodataframe(
        theme=theme,
        type=type,
        geometry_filter=box(*bbox),
        release=release,
        include_all_possible_columns=include_all_possible_columns,
        hierarchy_depth=hierarchy_depth,
        pyarrow_filters=pyarrow_filters,
        ignore_cache=ignore_cache,
        working_directory=working_directory,
        verbosity_mode=verbosity_mode,
        max_workers=max_workers,
        places_use_primary_category_only=places_use_primary_category_only,
        places_minimal_confidence=places_minimal_confidence,
    )

convert_bounding_box_to_wide_form_geodataframe_for_all_types

convert_bounding_box_to_wide_form_geodataframe_for_all_types(
    bbox: tuple[float, float, float, float],
    release: Optional[str] = None,
    *,
    include_all_possible_columns: bool = True,
    hierarchy_depth: Optional[Union[int, list[Optional[int]]]] = None,
    pyarrow_filters: Optional[list[Optional[PYARROW_FILTER]]] = None,
    ignore_cache: bool = False,
    working_directory: Union[str, Path] = "files",
    verbosity_mode: VERBOSITY_MODE = "transient",
    max_workers: Optional[int] = None,
    places_use_primary_category_only: bool = False,
    places_minimal_confidence: float = 0.75
) -> gpd.GeoDataFrame

Get GeoDataFrame for a given bounding box in a wide format for all types.

Automatically downloads Overture Maps dataset for a given release and all available theme/types in a concurrent manner and returns a single GeoDataFrame as a result with multiple columns based on dataset schema.

PARAMETER DESCRIPTION
bbox

Bounding box used to filter data. Order of values: xmin, ymin, xmax, ymax.

TYPE: tuple[float, float, float, float]

release

Release version. If not provided, will automatically load newest available release version. Defaults to None.

TYPE: Optional[str] DEFAULT: None

include_all_possible_columns

Whether to have always the same list of columns in the resulting file. This ensures that always the same set of columns is returned for a given release for different regions. This also means, that some columns might be all filled with a False value. Defaults to True.

TYPE: bool DEFAULT: True

hierarchy_depth

Depth used to calculate how many hierarchy columns should be used to generate the wide form of the data. Can be a single integer or a list of integers. If None, will use all available columns. Defaults to None.

TYPE: Optional[Union[int, list[Optional[int]]]] DEFAULT: None

pyarrow_filters

A list of pyarrow expressions used to filter specific theme type pair. Must be the same length as the list of theme type pairs. Defaults to None.

TYPE: Optional[list[Optional[PYARROW_FILTER]]] DEFAULT: None

ignore_cache

Whether to ignore precalculated geoparquet files or not. Defaults to False.

TYPE: bool DEFAULT: False

working_directory

Directory where to save the downloaded *.parquet files. Defaults to "files".

TYPE: Union[str, Path] DEFAULT: 'files'

verbosity_mode

Set progress verbosity mode. Can be one of: silent, transient and verbose. Silent disables output completely. Transient tracks progress, but removes output after finished. Verbose leaves all progress outputs in the stdout. Defaults to "transient".

TYPE: Literal[silent, transient, verbose] DEFAULT: 'transient'

max_workers

Max number of multiprocessing workers used to process the dataset. Defaults to None.

TYPE: Optional[int] DEFAULT: None

places_use_primary_category_only

Whether to use only the primary category for places. Defaults to False.

TYPE: bool DEFAULT: False

places_minimal_confidence

Minimal confidence level for the places dataset. Defaults to 0.75.

TYPE: float DEFAULT: 0.75

RETURNS DESCRIPTION
GeoDataFrame

gpd.GeoDataFrame: GeoDataFrame with Overture Maps features.

Source code in overturemaestro/advanced_functions/functions.py
def convert_bounding_box_to_wide_form_geodataframe_for_all_types(
    bbox: tuple[float, float, float, float],
    release: Optional[str] = None,
    *,
    include_all_possible_columns: bool = True,
    hierarchy_depth: Optional[Union[int, list[Optional[int]]]] = None,
    pyarrow_filters: Optional[list[Optional[PYARROW_FILTER]]] = None,
    ignore_cache: bool = False,
    working_directory: Union[str, Path] = "files",
    verbosity_mode: VERBOSITY_MODE = "transient",
    max_workers: Optional[int] = None,
    places_use_primary_category_only: bool = False,
    places_minimal_confidence: float = 0.75,
) -> gpd.GeoDataFrame:
    """
    Get GeoDataFrame for a given bounding box in a wide format for all types.

    Automatically downloads Overture Maps dataset for a given release and all available theme/types
    in a concurrent manner and returns a single GeoDataFrame as a result with multiple columns based
    on dataset schema.

    Args:
        bbox (tuple[float, float, float, float]): Bounding box used to filter data.
            Order of values: xmin, ymin, xmax, ymax.
        release (Optional[str], optional): Release version. If not provided, will automatically load
            newest available release version. Defaults to None.
        include_all_possible_columns (bool, optional): Whether to have always the same list of
            columns in the resulting file. This ensures that always the same set of columns is
            returned for a given release for different regions. This also means, that some columns
            might be all filled with a False value. Defaults to True.
        hierarchy_depth (Optional[Union[int, list[Optional[int]]]], optional): Depth used to
            calculate how many hierarchy columns should be used to generate the wide form of
            the data. Can be a single integer or a list of integers. If None, will use all
            available columns. Defaults to None.
        pyarrow_filters (Optional[list[Optional[PYARROW_FILTER]]], optional): A list of pyarrow
            expressions used to filter specific theme type pair. Must be the same length as the list
            of theme type pairs. Defaults to None.
        ignore_cache (bool, optional): Whether to ignore precalculated geoparquet files or not.
            Defaults to False.
        working_directory (Union[str, Path], optional): Directory where to save
            the downloaded `*.parquet` files. Defaults to "files".
        verbosity_mode (Literal["silent", "transient", "verbose"], optional): Set progress
            verbosity mode. Can be one of: silent, transient and verbose. Silent disables
            output completely. Transient tracks progress, but removes output after finished.
            Verbose leaves all progress outputs in the stdout. Defaults to "transient".
        max_workers (Optional[int], optional): Max number of multiprocessing workers used to
            process the dataset. Defaults to None.
        places_use_primary_category_only (bool, optional): Whether to use only the primary category
            for places. Defaults to False.
        places_minimal_confidence (float, optional): Minimal confidence level for the places
            dataset. Defaults to 0.75.

    Returns:
        gpd.GeoDataFrame: GeoDataFrame with Overture Maps features.
    """
    return convert_geometry_to_wide_form_geodataframe_for_all_types(
        geometry_filter=box(*bbox),
        release=release,
        include_all_possible_columns=include_all_possible_columns,
        hierarchy_depth=hierarchy_depth,
        pyarrow_filters=pyarrow_filters,
        ignore_cache=ignore_cache,
        working_directory=working_directory,
        verbosity_mode=verbosity_mode,
        max_workers=max_workers,
        places_use_primary_category_only=places_use_primary_category_only,
        places_minimal_confidence=places_minimal_confidence,
    )

convert_bounding_box_to_wide_form_geodataframe_for_multiple_types

convert_bounding_box_to_wide_form_geodataframe_for_multiple_types(
    theme_type_pairs: list[tuple[str, str]],
    bbox: tuple[float, float, float, float],
    release: Optional[str] = None,
    *,
    include_all_possible_columns: bool = True,
    hierarchy_depth: Optional[Union[int, list[Optional[int]]]] = None,
    pyarrow_filters: Optional[list[Optional[PYARROW_FILTER]]] = None,
    ignore_cache: bool = False,
    working_directory: Union[str, Path] = "files",
    verbosity_mode: VERBOSITY_MODE = "transient",
    max_workers: Optional[int] = None,
    places_use_primary_category_only: bool = False,
    places_minimal_confidence: float = 0.75
) -> gpd.GeoDataFrame

Get GeoDataFrame for a given bounding box in a wide format for multiple types.

Automatically downloads Overture Maps dataset for a given release and theme/type pairs in a concurrent manner and returns a single GeoDataFrame as a result with multiple columns based on dataset schema.

PARAMETER DESCRIPTION
theme_type_pairs

Pairs of themes and types of the dataset.

TYPE: list[tuple[str, str]]

bbox

Bounding box used to filter data. Order of values: xmin, ymin, xmax, ymax.

TYPE: tuple[float, float, float, float]

release

Release version. If not provided, will automatically load newest available release version. Defaults to None.

TYPE: Optional[str] DEFAULT: None

include_all_possible_columns

Whether to have always the same list of columns in the resulting file. This ensures that always the same set of columns is returned for a given release for different regions. This also means, that some columns might be all filled with a False value. Defaults to True.

TYPE: bool DEFAULT: True

hierarchy_depth

Depth used to calculate how many hierarchy columns should be used to generate the wide form of the data. Can be a single integer or a list of integers. If None, will use all available columns. Defaults to None.

TYPE: Optional[Union[int, list[Optional[int]]]] DEFAULT: None

pyarrow_filters

A list of pyarrow expressions used to filter specific theme type pair. Must be the same length as the list of theme type pairs. Defaults to None.

TYPE: Optional[list[Optional[PYARROW_FILTER]]] DEFAULT: None

ignore_cache

Whether to ignore precalculated geoparquet files or not. Defaults to False.

TYPE: bool DEFAULT: False

working_directory

Directory where to save the downloaded *.parquet files. Defaults to "files".

TYPE: Union[str, Path] DEFAULT: 'files'

verbosity_mode

Set progress verbosity mode. Can be one of: silent, transient and verbose. Silent disables output completely. Transient tracks progress, but removes output after finished. Verbose leaves all progress outputs in the stdout. Defaults to "transient".

TYPE: Literal[silent, transient, verbose] DEFAULT: 'transient'

max_workers

Max number of multiprocessing workers used to process the dataset. Defaults to None.

TYPE: Optional[int] DEFAULT: None

places_use_primary_category_only

Whether to use only the primary category for places. Defaults to False.

TYPE: bool DEFAULT: False

places_minimal_confidence

Minimal confidence level for the places dataset. Defaults to 0.75.

TYPE: float DEFAULT: 0.75

RETURNS DESCRIPTION
GeoDataFrame

gpd.GeoDataFrame: GeoDataFrame with Overture Maps features.

Source code in overturemaestro/advanced_functions/functions.py
def convert_bounding_box_to_wide_form_geodataframe_for_multiple_types(
    theme_type_pairs: list[tuple[str, str]],
    bbox: tuple[float, float, float, float],
    release: Optional[str] = None,
    *,
    include_all_possible_columns: bool = True,
    hierarchy_depth: Optional[Union[int, list[Optional[int]]]] = None,
    pyarrow_filters: Optional[list[Optional[PYARROW_FILTER]]] = None,
    ignore_cache: bool = False,
    working_directory: Union[str, Path] = "files",
    verbosity_mode: VERBOSITY_MODE = "transient",
    max_workers: Optional[int] = None,
    places_use_primary_category_only: bool = False,
    places_minimal_confidence: float = 0.75,
) -> gpd.GeoDataFrame:
    """
    Get GeoDataFrame for a given bounding box in a wide format for multiple types.

    Automatically downloads Overture Maps dataset for a given release and theme/type pairs
    in a concurrent manner and returns a single GeoDataFrame as a result with multiple columns based
    on dataset schema.

    Args:
        theme_type_pairs (list[tuple[str, str]]): Pairs of themes and types of the dataset.
        bbox (tuple[float, float, float, float]): Bounding box used to filter data.
            Order of values: xmin, ymin, xmax, ymax.
        release (Optional[str], optional): Release version. If not provided, will automatically load
            newest available release version. Defaults to None.
        include_all_possible_columns (bool, optional): Whether to have always the same list of
            columns in the resulting file. This ensures that always the same set of columns is
            returned for a given release for different regions. This also means, that some columns
            might be all filled with a False value. Defaults to True.
        hierarchy_depth (Optional[Union[int, list[Optional[int]]]], optional): Depth used to
            calculate how many hierarchy columns should be used to generate the wide form of
            the data. Can be a single integer or a list of integers. If None, will use all
            available columns. Defaults to None.
        pyarrow_filters (Optional[list[Optional[PYARROW_FILTER]]], optional): A list of pyarrow
            expressions used to filter specific theme type pair. Must be the same length as the list
            of theme type pairs. Defaults to None.
        ignore_cache (bool, optional): Whether to ignore precalculated geoparquet files or not.
            Defaults to False.
        working_directory (Union[str, Path], optional): Directory where to save
            the downloaded `*.parquet` files. Defaults to "files".
        verbosity_mode (Literal["silent", "transient", "verbose"], optional): Set progress
            verbosity mode. Can be one of: silent, transient and verbose. Silent disables
            output completely. Transient tracks progress, but removes output after finished.
            Verbose leaves all progress outputs in the stdout. Defaults to "transient".
        max_workers (Optional[int], optional): Max number of multiprocessing workers used to
            process the dataset. Defaults to None.
        places_use_primary_category_only (bool, optional): Whether to use only the primary category
            for places. Defaults to False.
        places_minimal_confidence (float, optional): Minimal confidence level for the places
            dataset. Defaults to 0.75.

    Returns:
        gpd.GeoDataFrame: GeoDataFrame with Overture Maps features.
    """
    return convert_geometry_to_wide_form_geodataframe_for_multiple_types(
        theme_type_pairs=theme_type_pairs,
        geometry_filter=box(*bbox),
        release=release,
        include_all_possible_columns=include_all_possible_columns,
        hierarchy_depth=hierarchy_depth,
        pyarrow_filters=pyarrow_filters,
        ignore_cache=ignore_cache,
        working_directory=working_directory,
        verbosity_mode=verbosity_mode,
        max_workers=max_workers,
        places_use_primary_category_only=places_use_primary_category_only,
        places_minimal_confidence=places_minimal_confidence,
    )

convert_bounding_box_to_wide_form_parquet

convert_bounding_box_to_wide_form_parquet(
    theme: str,
    type: str,
    bbox: tuple[float, float, float, float],
    release: Optional[str] = None,
    *,
    include_all_possible_columns: bool = True,
    hierarchy_depth: Optional[int] = None,
    pyarrow_filters: Optional[PYARROW_FILTER] = None,
    result_file_path: Optional[Union[str, Path]] = None,
    ignore_cache: bool = False,
    working_directory: Union[str, Path] = "files",
    verbosity_mode: VERBOSITY_MODE = "transient",
    max_workers: Optional[int] = None,
    places_use_primary_category_only: bool = False,
    places_minimal_confidence: float = 0.75
) -> Path

Get GeoParquet file for a given bounding box in a wide format.

Automatically downloads Overture Maps dataset for a given release and theme/type in a concurrent manner and returns a single file as a result with multiple columns based on dataset schema.

PARAMETER DESCRIPTION
theme

Theme of the dataset.

TYPE: str

type

Type of the dataset.

TYPE: str

bbox

Bounding box used to filter data. Order of values: xmin, ymin, xmax, ymax.

TYPE: tuple[float, float, float, float]

release

Release version. If not provided, will automatically load newest available release version. Defaults to None.

TYPE: Optional[str] DEFAULT: None

include_all_possible_columns

Whether to have always the same list of columns in the resulting file. This ensures that always the same set of columns is returned for a given release for different regions. This also means, that some columns might be all filled with a False value. Defaults to True.

TYPE: bool DEFAULT: True

hierarchy_depth

Depth used to calculate how many hierarchy columns should be used to generate the wide form of the data. If None, will use all available columns. Defaults to None.

TYPE: Optional[int] DEFAULT: None

pyarrow_filters

A pyarrow expression used to filter specific theme type pair. Defaults to None.

TYPE: Optional[PYARROW_FILTER] DEFAULT: None

result_file_path

Where to save the geoparquet file. If not provided, will be generated based on hashes from filters. Defaults to None.

TYPE: Union[str, Path] DEFAULT: None

ignore_cache

Whether to ignore precalculated geoparquet files or not. Defaults to False.

TYPE: bool DEFAULT: False

working_directory

Directory where to save the downloaded *.parquet files. Defaults to "files".

TYPE: Union[str, Path] DEFAULT: 'files'

verbosity_mode

Set progress verbosity mode. Can be one of: silent, transient and verbose. Silent disables output completely. Transient tracks progress, but removes output after finished. Verbose leaves all progress outputs in the stdout. Defaults to "transient".

TYPE: Literal[silent, transient, verbose] DEFAULT: 'transient'

max_workers

Max number of multiprocessing workers used to process the dataset. Defaults to None.

TYPE: Optional[int] DEFAULT: None

places_use_primary_category_only

Whether to use only the primary category for places. Defaults to False.

TYPE: bool DEFAULT: False

places_minimal_confidence

Minimal confidence level for the places dataset. Defaults to 0.75.

TYPE: float DEFAULT: 0.75

RETURNS DESCRIPTION
Path

Path to the generated GeoParquet file.

TYPE: Path

Source code in overturemaestro/advanced_functions/functions.py
def convert_bounding_box_to_wide_form_parquet(
    theme: str,
    type: str,
    bbox: tuple[float, float, float, float],
    release: Optional[str] = None,
    *,
    include_all_possible_columns: bool = True,
    hierarchy_depth: Optional[int] = None,
    pyarrow_filters: Optional[PYARROW_FILTER] = None,
    result_file_path: Optional[Union[str, Path]] = None,
    ignore_cache: bool = False,
    working_directory: Union[str, Path] = "files",
    verbosity_mode: VERBOSITY_MODE = "transient",
    max_workers: Optional[int] = None,
    places_use_primary_category_only: bool = False,
    places_minimal_confidence: float = 0.75,
) -> Path:
    """
    Get GeoParquet file for a given bounding box in a wide format.

    Automatically downloads Overture Maps dataset for a given release and theme/type
    in a concurrent manner and returns a single file as a result with multiple columns based
    on dataset schema.

    Args:
        theme (str): Theme of the dataset.
        type (str): Type of the dataset.
        bbox (tuple[float, float, float, float]): Bounding box used to filter data.
            Order of values: xmin, ymin, xmax, ymax.
        release (Optional[str], optional): Release version. If not provided, will automatically load
            newest available release version. Defaults to None.
        include_all_possible_columns (bool, optional): Whether to have always the same list of
            columns in the resulting file. This ensures that always the same set of columns is
            returned for a given release for different regions. This also means, that some columns
            might be all filled with a False value. Defaults to True.
        hierarchy_depth (Optional[int], optional): Depth used to calculate how many hierarchy
            columns should be used to generate the wide form of the data. If None, will use all
            available columns. Defaults to None.
        pyarrow_filters (Optional[PYARROW_FILTER], optional): A pyarrow
            expression used to filter specific theme type pair. Defaults to None.
        result_file_path (Union[str, Path], optional): Where to save
            the geoparquet file. If not provided, will be generated based on hashes
            from filters. Defaults to None.
        ignore_cache (bool, optional): Whether to ignore precalculated geoparquet files or not.
            Defaults to False.
        working_directory (Union[str, Path], optional): Directory where to save
            the downloaded `*.parquet` files. Defaults to "files".
        verbosity_mode (Literal["silent", "transient", "verbose"], optional): Set progress
            verbosity mode. Can be one of: silent, transient and verbose. Silent disables
            output completely. Transient tracks progress, but removes output after finished.
            Verbose leaves all progress outputs in the stdout. Defaults to "transient".
        max_workers (Optional[int], optional): Max number of multiprocessing workers used to
            process the dataset. Defaults to None.
        places_use_primary_category_only (bool, optional): Whether to use only the primary category
            for places. Defaults to False.
        places_minimal_confidence (float, optional): Minimal confidence level for the places
            dataset. Defaults to 0.75.

    Returns:
        Path: Path to the generated GeoParquet file.
    """
    return convert_geometry_to_wide_form_parquet(
        theme=theme,
        type=type,
        geometry_filter=box(*bbox),
        release=release,
        include_all_possible_columns=include_all_possible_columns,
        hierarchy_depth=hierarchy_depth,
        pyarrow_filters=pyarrow_filters,
        result_file_path=result_file_path,
        ignore_cache=ignore_cache,
        working_directory=working_directory,
        verbosity_mode=verbosity_mode,
        max_workers=max_workers,
        places_use_primary_category_only=places_use_primary_category_only,
        places_minimal_confidence=places_minimal_confidence,
    )

convert_bounding_box_to_wide_form_parquet_for_all_types

convert_bounding_box_to_wide_form_parquet_for_all_types(
    bbox: tuple[float, float, float, float],
    release: Optional[str] = None,
    *,
    include_all_possible_columns: bool = True,
    hierarchy_depth: Optional[Union[int, list[Optional[int]]]] = None,
    pyarrow_filters: Optional[list[Optional[PYARROW_FILTER]]] = None,
    result_file_path: Optional[Union[str, Path]] = None,
    ignore_cache: bool = False,
    working_directory: Union[str, Path] = "files",
    verbosity_mode: VERBOSITY_MODE = "transient",
    max_workers: Optional[int] = None,
    places_use_primary_category_only: bool = False,
    places_minimal_confidence: float = 0.75
) -> Path

Get GeoParquet file for a given bounding box in a wide format for all types.

Automatically downloads Overture Maps dataset for a given release and all available theme/types in a concurrent manner and returns a single file as a result with multiple columns based on dataset schema.

PARAMETER DESCRIPTION
bbox

Bounding box used to filter data. Order of values: xmin, ymin, xmax, ymax.

TYPE: tuple[float, float, float, float]

release

Release version. If not provided, will automatically load newest available release version. Defaults to None.

TYPE: Optional[str] DEFAULT: None

include_all_possible_columns

Whether to have always the same list of columns in the resulting file. This ensures that always the same set of columns is returned for a given release for different regions. This also means, that some columns might be all filled with a False value. Defaults to True.

TYPE: bool DEFAULT: True

hierarchy_depth

Depth used to calculate how many hierarchy columns should be used to generate the wide form of the data. Can be a single integer or a list of integers. If None, will use all available columns. Defaults to None.

TYPE: Optional[Union[int, list[Optional[int]]]] DEFAULT: None

pyarrow_filters

A list of pyarrow expressions used to filter specific theme type pair. Must be the same length as the list of theme type pairs. Defaults to None.

TYPE: Optional[list[Optional[PYARROW_FILTER]]] DEFAULT: None

result_file_path

Where to save the geoparquet file. If not provided, will be generated based on hashes from filters. Defaults to None.

TYPE: Union[str, Path] DEFAULT: None

ignore_cache

Whether to ignore precalculated geoparquet files or not. Defaults to False.

TYPE: bool DEFAULT: False

working_directory

Directory where to save the downloaded *.parquet files. Defaults to "files".

TYPE: Union[str, Path] DEFAULT: 'files'

verbosity_mode

Set progress verbosity mode. Can be one of: silent, transient and verbose. Silent disables output completely. Transient tracks progress, but removes output after finished. Verbose leaves all progress outputs in the stdout. Defaults to "transient".

TYPE: Literal[silent, transient, verbose] DEFAULT: 'transient'

max_workers

Max number of multiprocessing workers used to process the dataset. Defaults to None.

TYPE: Optional[int] DEFAULT: None

places_use_primary_category_only

Whether to use only the primary category for places. Defaults to False.

TYPE: bool DEFAULT: False

places_minimal_confidence

Minimal confidence level for the places dataset. Defaults to 0.75.

TYPE: float DEFAULT: 0.75

RETURNS DESCRIPTION
Path

Path to the generated GeoParquet file.

TYPE: Path

Source code in overturemaestro/advanced_functions/functions.py
def convert_bounding_box_to_wide_form_parquet_for_all_types(
    bbox: tuple[float, float, float, float],
    release: Optional[str] = None,
    *,
    include_all_possible_columns: bool = True,
    hierarchy_depth: Optional[Union[int, list[Optional[int]]]] = None,
    pyarrow_filters: Optional[list[Optional[PYARROW_FILTER]]] = None,
    result_file_path: Optional[Union[str, Path]] = None,
    ignore_cache: bool = False,
    working_directory: Union[str, Path] = "files",
    verbosity_mode: VERBOSITY_MODE = "transient",
    max_workers: Optional[int] = None,
    places_use_primary_category_only: bool = False,
    places_minimal_confidence: float = 0.75,
) -> Path:
    """
    Get GeoParquet file for a given bounding box in a wide format for all types.

    Automatically downloads Overture Maps dataset for a given release and all available theme/types
    in a concurrent manner and returns a single file as a result with multiple columns based
    on dataset schema.

    Args:
        bbox (tuple[float, float, float, float]): Bounding box used to filter data.
            Order of values: xmin, ymin, xmax, ymax.
        release (Optional[str], optional): Release version. If not provided, will automatically load
            newest available release version. Defaults to None.
        include_all_possible_columns (bool, optional): Whether to have always the same list of
            columns in the resulting file. This ensures that always the same set of columns is
            returned for a given release for different regions. This also means, that some columns
            might be all filled with a False value. Defaults to True.
        hierarchy_depth (Optional[Union[int, list[Optional[int]]]], optional): Depth used to
            calculate how many hierarchy columns should be used to generate the wide form of
            the data. Can be a single integer or a list of integers. If None, will use all
            available columns. Defaults to None.
        pyarrow_filters (Optional[list[Optional[PYARROW_FILTER]]], optional): A list of pyarrow
            expressions used to filter specific theme type pair. Must be the same length as the list
            of theme type pairs. Defaults to None.
        result_file_path (Union[str, Path], optional): Where to save
            the geoparquet file. If not provided, will be generated based on hashes
            from filters. Defaults to None.
        ignore_cache (bool, optional): Whether to ignore precalculated geoparquet files or not.
            Defaults to False.
        working_directory (Union[str, Path], optional): Directory where to save
            the downloaded `*.parquet` files. Defaults to "files".
        verbosity_mode (Literal["silent", "transient", "verbose"], optional): Set progress
            verbosity mode. Can be one of: silent, transient and verbose. Silent disables
            output completely. Transient tracks progress, but removes output after finished.
            Verbose leaves all progress outputs in the stdout. Defaults to "transient".
        max_workers (Optional[int], optional): Max number of multiprocessing workers used to
            process the dataset. Defaults to None.
        places_use_primary_category_only (bool, optional): Whether to use only the primary category
            for places. Defaults to False.
        places_minimal_confidence (float, optional): Minimal confidence level for the places
            dataset. Defaults to 0.75.

    Returns:
        Path: Path to the generated GeoParquet file.
    """
    return convert_geometry_to_wide_form_parquet_for_all_types(
        geometry_filter=box(*bbox),
        release=release,
        include_all_possible_columns=include_all_possible_columns,
        hierarchy_depth=hierarchy_depth,
        pyarrow_filters=pyarrow_filters,
        result_file_path=result_file_path,
        ignore_cache=ignore_cache,
        working_directory=working_directory,
        verbosity_mode=verbosity_mode,
        max_workers=max_workers,
        places_use_primary_category_only=places_use_primary_category_only,
        places_minimal_confidence=places_minimal_confidence,
    )

convert_bounding_box_to_wide_form_parquet_for_multiple_types

convert_bounding_box_to_wide_form_parquet_for_multiple_types(
    theme_type_pairs: list[tuple[str, str]],
    bbox: tuple[float, float, float, float],
    release: Optional[str] = None,
    *,
    include_all_possible_columns: bool = True,
    hierarchy_depth: Optional[Union[int, list[Optional[int]]]] = None,
    pyarrow_filters: Optional[list[Optional[PYARROW_FILTER]]] = None,
    result_file_path: Optional[Union[str, Path]] = None,
    ignore_cache: bool = False,
    working_directory: Union[str, Path] = "files",
    verbosity_mode: VERBOSITY_MODE = "transient",
    max_workers: Optional[int] = None,
    places_use_primary_category_only: bool = False,
    places_minimal_confidence: float = 0.75
) -> Path

Get GeoParquet file for a given bounding box in a wide format for multiple types.

Automatically downloads Overture Maps dataset for a given release and theme/type pairs in a concurrent manner and returns a single file as a result with multiple columns based on dataset schema.

PARAMETER DESCRIPTION
theme_type_pairs

Pairs of themes and types of the dataset.

TYPE: list[tuple[str, str]]

bbox

Bounding box used to filter data. Order of values: xmin, ymin, xmax, ymax.

TYPE: tuple[float, float, float, float]

release

Release version. If not provided, will automatically load newest available release version. Defaults to None.

TYPE: Optional[str] DEFAULT: None

include_all_possible_columns

Whether to have always the same list of columns in the resulting file. This ensures that always the same set of columns is returned for a given release for different regions. This also means, that some columns might be all filled with a False value. Defaults to True.

TYPE: bool DEFAULT: True

hierarchy_depth

Depth used to calculate how many hierarchy columns should be used to generate the wide form of the data. Can be a single integer or a list of integers. If None, will use all available columns. Defaults to None.

TYPE: Optional[Union[int, list[Optional[int]]]] DEFAULT: None

pyarrow_filters

A list of pyarrow expressions used to filter specific theme type pair. Must be the same length as the list of theme type pairs. Defaults to None.

TYPE: Optional[list[Optional[PYARROW_FILTER]]] DEFAULT: None

result_file_path

Where to save the geoparquet file. If not provided, will be generated based on hashes from filters. Defaults to None.

TYPE: Union[str, Path] DEFAULT: None

ignore_cache

Whether to ignore precalculated geoparquet files or not. Defaults to False.

TYPE: bool DEFAULT: False

working_directory

Directory where to save the downloaded *.parquet files. Defaults to "files".

TYPE: Union[str, Path] DEFAULT: 'files'

verbosity_mode

Set progress verbosity mode. Can be one of: silent, transient and verbose. Silent disables output completely. Transient tracks progress, but removes output after finished. Verbose leaves all progress outputs in the stdout. Defaults to "transient".

TYPE: Literal[silent, transient, verbose] DEFAULT: 'transient'

max_workers

Max number of multiprocessing workers used to process the dataset. Defaults to None.

TYPE: Optional[int] DEFAULT: None

places_use_primary_category_only

Whether to use only the primary category for places. Defaults to False.

TYPE: bool DEFAULT: False

places_minimal_confidence

Minimal confidence level for the places dataset. Defaults to 0.75.

TYPE: float DEFAULT: 0.75

RETURNS DESCRIPTION
Path

Path to the generated GeoParquet file.

TYPE: Path

Source code in overturemaestro/advanced_functions/functions.py
def convert_bounding_box_to_wide_form_parquet_for_multiple_types(
    theme_type_pairs: list[tuple[str, str]],
    bbox: tuple[float, float, float, float],
    release: Optional[str] = None,
    *,
    include_all_possible_columns: bool = True,
    hierarchy_depth: Optional[Union[int, list[Optional[int]]]] = None,
    pyarrow_filters: Optional[list[Optional[PYARROW_FILTER]]] = None,
    result_file_path: Optional[Union[str, Path]] = None,
    ignore_cache: bool = False,
    working_directory: Union[str, Path] = "files",
    verbosity_mode: VERBOSITY_MODE = "transient",
    max_workers: Optional[int] = None,
    places_use_primary_category_only: bool = False,
    places_minimal_confidence: float = 0.75,
) -> Path:
    """
    Get GeoParquet file for a given bounding box in a wide format for multiple types.

    Automatically downloads Overture Maps dataset for a given release and theme/type pairs
    in a concurrent manner and returns a single file as a result with multiple columns based
    on dataset schema.

    Args:
        theme_type_pairs (list[tuple[str, str]]): Pairs of themes and types of the dataset.
        bbox (tuple[float, float, float, float]): Bounding box used to filter data.
            Order of values: xmin, ymin, xmax, ymax.
        release (Optional[str], optional): Release version. If not provided, will automatically load
            newest available release version. Defaults to None.
        include_all_possible_columns (bool, optional): Whether to have always the same list of
            columns in the resulting file. This ensures that always the same set of columns is
            returned for a given release for different regions. This also means, that some columns
            might be all filled with a False value. Defaults to True.
        hierarchy_depth (Optional[Union[int, list[Optional[int]]]], optional): Depth used to
            calculate how many hierarchy columns should be used to generate the wide form of
            the data. Can be a single integer or a list of integers. If None, will use all
            available columns. Defaults to None.
        pyarrow_filters (Optional[list[Optional[PYARROW_FILTER]]], optional): A list of pyarrow
            expressions used to filter specific theme type pair. Must be the same length as the list
            of theme type pairs. Defaults to None.
        result_file_path (Union[str, Path], optional): Where to save
            the geoparquet file. If not provided, will be generated based on hashes
            from filters. Defaults to None.
        ignore_cache (bool, optional): Whether to ignore precalculated geoparquet files or not.
            Defaults to False.
        working_directory (Union[str, Path], optional): Directory where to save
            the downloaded `*.parquet` files. Defaults to "files".
        verbosity_mode (Literal["silent", "transient", "verbose"], optional): Set progress
            verbosity mode. Can be one of: silent, transient and verbose. Silent disables
            output completely. Transient tracks progress, but removes output after finished.
            Verbose leaves all progress outputs in the stdout. Defaults to "transient".
        max_workers (Optional[int], optional): Max number of multiprocessing workers used to
            process the dataset. Defaults to None.
        places_use_primary_category_only (bool, optional): Whether to use only the primary category
            for places. Defaults to False.
        places_minimal_confidence (float, optional): Minimal confidence level for the places
            dataset. Defaults to 0.75.

    Returns:
        Path: Path to the generated GeoParquet file.
    """
    return convert_geometry_to_wide_form_parquet_for_multiple_types(
        theme_type_pairs=theme_type_pairs,
        geometry_filter=box(*bbox),
        release=release,
        include_all_possible_columns=include_all_possible_columns,
        hierarchy_depth=hierarchy_depth,
        pyarrow_filters=pyarrow_filters,
        result_file_path=result_file_path,
        ignore_cache=ignore_cache,
        working_directory=working_directory,
        verbosity_mode=verbosity_mode,
        max_workers=max_workers,
        places_use_primary_category_only=places_use_primary_category_only,
        places_minimal_confidence=places_minimal_confidence,
    )

convert_geometry_to_wide_form_geodataframe

convert_geometry_to_wide_form_geodataframe(
    theme: str,
    type: str,
    geometry_filter: BaseGeometry,
    release: Optional[str] = None,
    *,
    include_all_possible_columns: bool = True,
    hierarchy_depth: Optional[int] = None,
    pyarrow_filters: Optional[PYARROW_FILTER] = None,
    ignore_cache: bool = False,
    working_directory: Union[str, Path] = "files",
    verbosity_mode: VERBOSITY_MODE = "transient",
    max_workers: Optional[int] = None,
    places_use_primary_category_only: bool = False,
    places_minimal_confidence: float = 0.75
) -> gpd.GeoDataFrame

Get GeoDataFrame for a given geometry in a wide format.

Automatically downloads Overture Maps dataset for a given release and theme/type in a concurrent manner and returns a single GeoDataFrame as a result with multiple columns based on dataset schema.

PARAMETER DESCRIPTION
theme

Theme of the dataset.

TYPE: str

type

Type of the dataset.

TYPE: str

geometry_filter

Geometry used to filter data.

TYPE: BaseGeometry

release

Release version. If not provided, will automatically load newest available release version. Defaults to None.

TYPE: Optional[str] DEFAULT: None

include_all_possible_columns

Whether to have always the same list of columns in the resulting file. This ensures that always the same set of columns is returned for a given release for different regions. This also means, that some columns might be all filled with a False value. Defaults to True.

TYPE: bool DEFAULT: True

hierarchy_depth

Depth used to calculate how many hierarchy columns should be used to generate the wide form of the data. If None, will use all available columns. Defaults to None.

TYPE: Optional[int] DEFAULT: None

pyarrow_filters

A pyarrow expression used to filter specific theme type pair. Defaults to None.

TYPE: Optional[PYARROW_FILTER] DEFAULT: None

ignore_cache

Whether to ignore precalculated geoparquet files or not. Defaults to False.

TYPE: bool DEFAULT: False

working_directory

Directory where to save the downloaded *.parquet files. Defaults to "files".

TYPE: Union[str, Path] DEFAULT: 'files'

verbosity_mode

Set progress verbosity mode. Can be one of: silent, transient and verbose. Silent disables output completely. Transient tracks progress, but removes output after finished. Verbose leaves all progress outputs in the stdout. Defaults to "transient".

TYPE: Literal[silent, transient, verbose] DEFAULT: 'transient'

max_workers

Max number of multiprocessing workers used to process the dataset. Defaults to None.

TYPE: Optional[int] DEFAULT: None

places_use_primary_category_only

Whether to use only the primary category for places. Defaults to False.

TYPE: bool DEFAULT: False

places_minimal_confidence

Minimal confidence level for the places dataset. Defaults to 0.75.

TYPE: float DEFAULT: 0.75

RETURNS DESCRIPTION
GeoDataFrame

gpd.GeoDataFrame: GeoDataFrame with Overture Maps features.

Source code in overturemaestro/advanced_functions/functions.py
def convert_geometry_to_wide_form_geodataframe(
    theme: str,
    type: str,
    geometry_filter: BaseGeometry,
    release: Optional[str] = None,
    *,
    include_all_possible_columns: bool = True,
    hierarchy_depth: Optional[int] = None,
    pyarrow_filters: Optional[PYARROW_FILTER] = None,
    ignore_cache: bool = False,
    working_directory: Union[str, Path] = "files",
    verbosity_mode: VERBOSITY_MODE = "transient",
    max_workers: Optional[int] = None,
    places_use_primary_category_only: bool = False,
    places_minimal_confidence: float = 0.75,
) -> gpd.GeoDataFrame:
    """
    Get GeoDataFrame for a given geometry in a wide format.

    Automatically downloads Overture Maps dataset for a given release and theme/type
    in a concurrent manner and returns a single GeoDataFrame as a result with multiple columns based
    on dataset schema.

    Args:
        theme (str): Theme of the dataset.
        type (str): Type of the dataset.
        geometry_filter (BaseGeometry): Geometry used to filter data.
        release (Optional[str], optional): Release version. If not provided, will automatically load
            newest available release version. Defaults to None.
        include_all_possible_columns (bool, optional): Whether to have always the same list of
            columns in the resulting file. This ensures that always the same set of columns is
            returned for a given release for different regions. This also means, that some columns
            might be all filled with a False value. Defaults to True.
        hierarchy_depth (Optional[int], optional): Depth used to calculate how many hierarchy
            columns should be used to generate the wide form of the data. If None, will use all
            available columns. Defaults to None.
        pyarrow_filters (Optional[PYARROW_FILTER], optional): A pyarrow expression used to filter
            specific theme type pair. Defaults to None.
        ignore_cache (bool, optional): Whether to ignore precalculated geoparquet files or not.
            Defaults to False.
        working_directory (Union[str, Path], optional): Directory where to save
            the downloaded `*.parquet` files. Defaults to "files".
        verbosity_mode (Literal["silent", "transient", "verbose"], optional): Set progress
            verbosity mode. Can be one of: silent, transient and verbose. Silent disables
            output completely. Transient tracks progress, but removes output after finished.
            Verbose leaves all progress outputs in the stdout. Defaults to "transient".
        max_workers (Optional[int], optional): Max number of multiprocessing workers used to
            process the dataset. Defaults to None.
        places_use_primary_category_only (bool, optional): Whether to use only the primary category
            for places. Defaults to False.
        places_minimal_confidence (float, optional): Minimal confidence level for the places
            dataset. Defaults to 0.75.

    Returns:
        gpd.GeoDataFrame: GeoDataFrame with Overture Maps features.
    """
    parsed_geoparquet_file = convert_geometry_to_wide_form_parquet(
        theme=theme,
        type=type,
        geometry_filter=geometry_filter,
        release=release,
        include_all_possible_columns=include_all_possible_columns,
        hierarchy_depth=hierarchy_depth,
        pyarrow_filters=pyarrow_filters,
        ignore_cache=ignore_cache,
        working_directory=working_directory,
        verbosity_mode=verbosity_mode,
        max_workers=max_workers,
        places_use_primary_category_only=places_use_primary_category_only,
        places_minimal_confidence=places_minimal_confidence,
    )
    return geodataframe_from_parquet(parsed_geoparquet_file)

convert_geometry_to_wide_form_geodataframe_for_all_types

convert_geometry_to_wide_form_geodataframe_for_all_types(
    geometry_filter: BaseGeometry,
    release: Optional[str] = None,
    *,
    include_all_possible_columns: bool = True,
    hierarchy_depth: Optional[Union[int, list[Optional[int]]]] = None,
    pyarrow_filters: Optional[list[Optional[PYARROW_FILTER]]] = None,
    ignore_cache: bool = False,
    working_directory: Union[str, Path] = "files",
    verbosity_mode: VERBOSITY_MODE = "transient",
    max_workers: Optional[int] = None,
    places_use_primary_category_only: bool = False,
    places_minimal_confidence: float = 0.75
) -> gpd.GeoDataFrame

Get GeoDataFrame for a given geometry in a wide format for all types.

Automatically downloads Overture Maps dataset for a given release and all available theme/types in a concurrent manner and returns a single GeoDataFrame as a result with multiple columns based on dataset schema.

PARAMETER DESCRIPTION
geometry_filter

Geometry used to filter data.

TYPE: BaseGeometry

release

Release version. If not provided, will automatically load newest available release version. Defaults to None.

TYPE: Optional[str] DEFAULT: None

include_all_possible_columns

Whether to have always the same list of columns in the resulting file. This ensures that always the same set of columns is returned for a given release for different regions. This also means, that some columns might be all filled with a False value. Defaults to True.

TYPE: bool DEFAULT: True

hierarchy_depth

Depth used to calculate how many hierarchy columns should be used to generate the wide form of the data. Can be a single integer or a list of integers. If None, will use all available columns. Defaults to None.

TYPE: Optional[Union[int, list[Optional[int]]]] DEFAULT: None

pyarrow_filters

A list of pyarrow expressions used to filter specific theme type pair. Must be the same length as the list of theme type pairs. Defaults to None.

TYPE: Optional[list[Optional[PYARROW_FILTER]]] DEFAULT: None

ignore_cache

Whether to ignore precalculated geoparquet files or not. Defaults to False.

TYPE: bool DEFAULT: False

working_directory

Directory where to save the downloaded *.parquet files. Defaults to "files".

TYPE: Union[str, Path] DEFAULT: 'files'

verbosity_mode

Set progress verbosity mode. Can be one of: silent, transient and verbose. Silent disables output completely. Transient tracks progress, but removes output after finished. Verbose leaves all progress outputs in the stdout. Defaults to "transient".

TYPE: Literal[silent, transient, verbose] DEFAULT: 'transient'

max_workers

Max number of multiprocessing workers used to process the dataset. Defaults to None.

TYPE: Optional[int] DEFAULT: None

places_use_primary_category_only

Whether to use only the primary category for places. Defaults to False.

TYPE: bool DEFAULT: False

places_minimal_confidence

Minimal confidence level for the places dataset. Defaults to 0.75.

TYPE: float DEFAULT: 0.75

RETURNS DESCRIPTION
GeoDataFrame

gpd.GeoDataFrame: GeoDataFrame with Overture Maps features.

Source code in overturemaestro/advanced_functions/functions.py
def convert_geometry_to_wide_form_geodataframe_for_all_types(
    geometry_filter: BaseGeometry,
    release: Optional[str] = None,
    *,
    include_all_possible_columns: bool = True,
    hierarchy_depth: Optional[Union[int, list[Optional[int]]]] = None,
    pyarrow_filters: Optional[list[Optional[PYARROW_FILTER]]] = None,
    ignore_cache: bool = False,
    working_directory: Union[str, Path] = "files",
    verbosity_mode: VERBOSITY_MODE = "transient",
    max_workers: Optional[int] = None,
    places_use_primary_category_only: bool = False,
    places_minimal_confidence: float = 0.75,
) -> gpd.GeoDataFrame:
    """
    Get GeoDataFrame for a given geometry in a wide format for all types.

    Automatically downloads Overture Maps dataset for a given release and all available theme/types
    in a concurrent manner and returns a single GeoDataFrame as a result with multiple columns based
    on dataset schema.

    Args:
        geometry_filter (BaseGeometry): Geometry used to filter data.
        release (Optional[str], optional): Release version. If not provided, will automatically load
            newest available release version. Defaults to None.
        include_all_possible_columns (bool, optional): Whether to have always the same list of
            columns in the resulting file. This ensures that always the same set of columns is
            returned for a given release for different regions. This also means, that some columns
            might be all filled with a False value. Defaults to True.
        hierarchy_depth (Optional[Union[int, list[Optional[int]]]], optional): Depth used to
            calculate how many hierarchy columns should be used to generate the wide form of
            the data. Can be a single integer or a list of integers. If None, will use all
            available columns. Defaults to None.
        pyarrow_filters (Optional[list[Optional[PYARROW_FILTER]]], optional): A list of pyarrow
            expressions used to filter specific theme type pair. Must be the same length as the list
            of theme type pairs. Defaults to None.
        ignore_cache (bool, optional): Whether to ignore precalculated geoparquet files or not.
            Defaults to False.
        working_directory (Union[str, Path], optional): Directory where to save
            the downloaded `*.parquet` files. Defaults to "files".
        verbosity_mode (Literal["silent", "transient", "verbose"], optional): Set progress
            verbosity mode. Can be one of: silent, transient and verbose. Silent disables
            output completely. Transient tracks progress, but removes output after finished.
            Verbose leaves all progress outputs in the stdout. Defaults to "transient".
        max_workers (Optional[int], optional): Max number of multiprocessing workers used to
            process the dataset. Defaults to None.
        places_use_primary_category_only (bool, optional): Whether to use only the primary category
            for places. Defaults to False.
        places_minimal_confidence (float, optional): Minimal confidence level for the places
            dataset. Defaults to 0.75.

    Returns:
        gpd.GeoDataFrame: GeoDataFrame with Overture Maps features.
    """
    parsed_geoparquet_file = convert_geometry_to_wide_form_parquet_for_all_types(
        geometry_filter=geometry_filter,
        release=release,
        include_all_possible_columns=include_all_possible_columns,
        hierarchy_depth=hierarchy_depth,
        pyarrow_filters=pyarrow_filters,
        ignore_cache=ignore_cache,
        working_directory=working_directory,
        verbosity_mode=verbosity_mode,
        max_workers=max_workers,
        places_use_primary_category_only=places_use_primary_category_only,
        places_minimal_confidence=places_minimal_confidence,
    )
    return geodataframe_from_parquet(parsed_geoparquet_file)

convert_geometry_to_wide_form_geodataframe_for_multiple_types

convert_geometry_to_wide_form_geodataframe_for_multiple_types(
    theme_type_pairs: list[tuple[str, str]],
    geometry_filter: BaseGeometry,
    release: Optional[str] = None,
    *,
    include_all_possible_columns: bool = True,
    hierarchy_depth: Optional[Union[int, list[Optional[int]]]] = None,
    pyarrow_filters: Optional[list[Optional[PYARROW_FILTER]]] = None,
    ignore_cache: bool = False,
    working_directory: Union[str, Path] = "files",
    verbosity_mode: VERBOSITY_MODE = "transient",
    max_workers: Optional[int] = None,
    places_use_primary_category_only: bool = False,
    places_minimal_confidence: float = 0.75
) -> gpd.GeoDataFrame

Get GeoDataFrame for a given geometry in a wide format for multiple types.

Automatically downloads Overture Maps dataset for a given release and theme/type pairs in a concurrent manner and returns a single GeoDataFrame as a result with multiple columns based on dataset schema.

PARAMETER DESCRIPTION
theme_type_pairs

Pairs of themes and types of the dataset.

TYPE: list[tuple[str, str]]

geometry_filter

Geometry used to filter data.

TYPE: BaseGeometry

release

Release version. If not provided, will automatically load newest available release version. Defaults to None.

TYPE: Optional[str] DEFAULT: None

include_all_possible_columns

Whether to have always the same list of columns in the resulting file. This ensures that always the same set of columns is returned for a given release for different regions. This also means, that some columns might be all filled with a False value. Defaults to True.

TYPE: bool DEFAULT: True

hierarchy_depth

Depth used to calculate how many hierarchy columns should be used to generate the wide form of the data. Can be a single integer or a list of integers. If None, will use all available columns. Defaults to None.

TYPE: Optional[Union[int, list[Optional[int]]]] DEFAULT: None

pyarrow_filters

A list of pyarrow expressions used to filter specific theme type pair. Must be the same length as the list of theme type pairs. Defaults to None.

TYPE: Optional[list[Optional[PYARROW_FILTER]]] DEFAULT: None

ignore_cache

Whether to ignore precalculated geoparquet files or not. Defaults to False.

TYPE: bool DEFAULT: False

working_directory

Directory where to save the downloaded *.parquet files. Defaults to "files".

TYPE: Union[str, Path] DEFAULT: 'files'

verbosity_mode

Set progress verbosity mode. Can be one of: silent, transient and verbose. Silent disables output completely. Transient tracks progress, but removes output after finished. Verbose leaves all progress outputs in the stdout. Defaults to "transient".

TYPE: Literal[silent, transient, verbose] DEFAULT: 'transient'

max_workers

Max number of multiprocessing workers used to process the dataset. Defaults to None.

TYPE: Optional[int] DEFAULT: None

places_use_primary_category_only

Whether to use only the primary category for places. Defaults to False.

TYPE: bool DEFAULT: False

places_minimal_confidence

Minimal confidence level for the places dataset. Defaults to 0.75.

TYPE: float DEFAULT: 0.75

RETURNS DESCRIPTION
GeoDataFrame

gpd.GeoDataFrame: GeoDataFrame with Overture Maps features.

Source code in overturemaestro/advanced_functions/functions.py
def convert_geometry_to_wide_form_geodataframe_for_multiple_types(
    theme_type_pairs: list[tuple[str, str]],
    geometry_filter: BaseGeometry,
    release: Optional[str] = None,
    *,
    include_all_possible_columns: bool = True,
    hierarchy_depth: Optional[Union[int, list[Optional[int]]]] = None,
    pyarrow_filters: Optional[list[Optional[PYARROW_FILTER]]] = None,
    ignore_cache: bool = False,
    working_directory: Union[str, Path] = "files",
    verbosity_mode: VERBOSITY_MODE = "transient",
    max_workers: Optional[int] = None,
    places_use_primary_category_only: bool = False,
    places_minimal_confidence: float = 0.75,
) -> gpd.GeoDataFrame:
    """
    Get GeoDataFrame for a given geometry in a wide format for multiple types.

    Automatically downloads Overture Maps dataset for a given release and theme/type pairs
    in a concurrent manner and returns a single GeoDataFrame as a result with multiple columns based
    on dataset schema.

    Args:
        theme_type_pairs (list[tuple[str, str]]): Pairs of themes and types of the dataset.
        geometry_filter (BaseGeometry): Geometry used to filter data.
        release (Optional[str], optional): Release version. If not provided, will automatically load
            newest available release version. Defaults to None.
        include_all_possible_columns (bool, optional): Whether to have always the same list of
            columns in the resulting file. This ensures that always the same set of columns is
            returned for a given release for different regions. This also means, that some columns
            might be all filled with a False value. Defaults to True.
        hierarchy_depth (Optional[Union[int, list[Optional[int]]]], optional): Depth used to
            calculate how many hierarchy columns should be used to generate the wide form of
            the data. Can be a single integer or a list of integers. If None, will use all
            available columns. Defaults to None.
        pyarrow_filters (Optional[list[Optional[PYARROW_FILTER]]], optional): A list of pyarrow
            expressions used to filter specific theme type pair. Must be the same length as the list
            of theme type pairs. Defaults to None.
        ignore_cache (bool, optional): Whether to ignore precalculated geoparquet files or not.
            Defaults to False.
        working_directory (Union[str, Path], optional): Directory where to save
            the downloaded `*.parquet` files. Defaults to "files".
        verbosity_mode (Literal["silent", "transient", "verbose"], optional): Set progress
            verbosity mode. Can be one of: silent, transient and verbose. Silent disables
            output completely. Transient tracks progress, but removes output after finished.
            Verbose leaves all progress outputs in the stdout. Defaults to "transient".
        max_workers (Optional[int], optional): Max number of multiprocessing workers used to
            process the dataset. Defaults to None.
        places_use_primary_category_only (bool, optional): Whether to use only the primary category
            for places. Defaults to False.
        places_minimal_confidence (float, optional): Minimal confidence level for the places
            dataset. Defaults to 0.75.

    Returns:
        gpd.GeoDataFrame: GeoDataFrame with Overture Maps features.
    """
    parsed_geoparquet_file = convert_geometry_to_wide_form_parquet_for_multiple_types(
        theme_type_pairs=theme_type_pairs,
        geometry_filter=geometry_filter,
        release=release,
        include_all_possible_columns=include_all_possible_columns,
        hierarchy_depth=hierarchy_depth,
        pyarrow_filters=pyarrow_filters,
        ignore_cache=ignore_cache,
        working_directory=working_directory,
        verbosity_mode=verbosity_mode,
        max_workers=max_workers,
        places_use_primary_category_only=places_use_primary_category_only,
        places_minimal_confidence=places_minimal_confidence,
    )
    return geodataframe_from_parquet(parsed_geoparquet_file)

convert_geometry_to_wide_form_parquet

convert_geometry_to_wide_form_parquet(
    theme: str,
    type: str,
    geometry_filter: BaseGeometry,
    release: Optional[str] = None,
    *,
    include_all_possible_columns: bool = True,
    hierarchy_depth: Optional[int] = None,
    pyarrow_filters: Optional[PYARROW_FILTER] = None,
    result_file_path: Optional[Union[str, Path]] = None,
    ignore_cache: bool = False,
    working_directory: Union[str, Path] = "files",
    verbosity_mode: VERBOSITY_MODE = "transient",
    max_workers: Optional[int] = None,
    places_use_primary_category_only: bool = False,
    places_minimal_confidence: float = 0.75
) -> Path

Get GeoParquet file for a given geometry in a wide format.

Automatically downloads Overture Maps dataset for a given release and theme/type in a concurrent manner and returns a single file as a result with multiple columns based on dataset schema.

PARAMETER DESCRIPTION
theme

Theme of the dataset.

TYPE: str

type

Type of the dataset.

TYPE: str

geometry_filter

Geometry used to filter data.

TYPE: BaseGeometry

release

Release version. If not provided, will automatically load newest available release version. Defaults to None.

TYPE: Optional[str] DEFAULT: None

include_all_possible_columns

Whether to have always the same list of columns in the resulting file. This ensures that always the same set of columns is returned for a given release for different regions. This also means, that some columns might be all filled with a False value. Defaults to True.

TYPE: bool DEFAULT: True

hierarchy_depth

Depth used to calculate how many hierarchy columns should be used to generate the wide form of the data. If None, will use all available columns. Defaults to None.

TYPE: Optional[int] DEFAULT: None

pyarrow_filters

A pyarrow expression used to filter specific theme type pair. Defaults to None.

TYPE: Optional[PYARROW_FILTER] DEFAULT: None

result_file_path

Where to save the geoparquet file. If not provided, will be generated based on hashes from filters. Defaults to None.

TYPE: Union[str, Path] DEFAULT: None

ignore_cache

Whether to ignore precalculated geoparquet files or not. Defaults to False.

TYPE: bool DEFAULT: False

working_directory

Directory where to save the downloaded *.parquet files. Defaults to "files".

TYPE: Union[str, Path] DEFAULT: 'files'

verbosity_mode

Set progress verbosity mode. Can be one of: silent, transient and verbose. Silent disables output completely. Transient tracks progress, but removes output after finished. Verbose leaves all progress outputs in the stdout. Defaults to "transient".

TYPE: Literal[silent, transient, verbose] DEFAULT: 'transient'

max_workers

Max number of multiprocessing workers used to process the dataset. Defaults to None.

TYPE: Optional[int] DEFAULT: None

places_use_primary_category_only

Whether to use only the primary category for places. Defaults to False.

TYPE: bool DEFAULT: False

places_minimal_confidence

Minimal confidence level for the places dataset. Defaults to 0.75.

TYPE: float DEFAULT: 0.75

RETURNS DESCRIPTION
Path

Path to the generated GeoParquet file.

TYPE: Path

Source code in overturemaestro/advanced_functions/functions.py
def convert_geometry_to_wide_form_parquet(
    theme: str,
    type: str,
    geometry_filter: BaseGeometry,
    release: Optional[str] = None,
    *,
    include_all_possible_columns: bool = True,
    hierarchy_depth: Optional[int] = None,
    pyarrow_filters: Optional[PYARROW_FILTER] = None,
    result_file_path: Optional[Union[str, Path]] = None,
    ignore_cache: bool = False,
    working_directory: Union[str, Path] = "files",
    verbosity_mode: VERBOSITY_MODE = "transient",
    max_workers: Optional[int] = None,
    places_use_primary_category_only: bool = False,
    places_minimal_confidence: float = 0.75,
) -> Path:
    """
    Get GeoParquet file for a given geometry in a wide format.

    Automatically downloads Overture Maps dataset for a given release and theme/type
    in a concurrent manner and returns a single file as a result with multiple columns based
    on dataset schema.

    Args:
        theme (str): Theme of the dataset.
        type (str): Type of the dataset.
        geometry_filter (BaseGeometry): Geometry used to filter data.
        release (Optional[str], optional): Release version. If not provided, will automatically load
            newest available release version. Defaults to None.
        include_all_possible_columns (bool, optional): Whether to have always the same list of
            columns in the resulting file. This ensures that always the same set of columns is
            returned for a given release for different regions. This also means, that some columns
            might be all filled with a False value. Defaults to True.
        hierarchy_depth (Optional[int], optional): Depth used to calculate how many hierarchy
            columns should be used to generate the wide form of the data. If None, will use all
            available columns. Defaults to None.
        pyarrow_filters (Optional[PYARROW_FILTER], optional): A pyarrow expression used to filter
            specific theme type pair. Defaults to None.
        result_file_path (Union[str, Path], optional): Where to save
            the geoparquet file. If not provided, will be generated based on hashes
            from filters. Defaults to None.
        ignore_cache (bool, optional): Whether to ignore precalculated geoparquet files or not.
            Defaults to False.
        working_directory (Union[str, Path], optional): Directory where to save
            the downloaded `*.parquet` files. Defaults to "files".
        verbosity_mode (Literal["silent", "transient", "verbose"], optional): Set progress
            verbosity mode. Can be one of: silent, transient and verbose. Silent disables
            output completely. Transient tracks progress, but removes output after finished.
            Verbose leaves all progress outputs in the stdout. Defaults to "transient".
        max_workers (Optional[int], optional): Max number of multiprocessing workers used to
            process the dataset. Defaults to None.
        places_use_primary_category_only (bool, optional): Whether to use only the primary category
            for places. Defaults to False.
        places_minimal_confidence (float, optional): Minimal confidence level for the places
            dataset. Defaults to 0.75.

    Returns:
        Path: Path to the generated GeoParquet file.
    """
    return convert_geometry_to_wide_form_parquet_for_multiple_types(
        theme_type_pairs=[(theme, type)],
        geometry_filter=geometry_filter,
        release=release,
        include_all_possible_columns=include_all_possible_columns,
        hierarchy_depth=hierarchy_depth,
        pyarrow_filters=[pyarrow_filters],
        result_file_path=result_file_path,
        ignore_cache=ignore_cache,
        working_directory=working_directory,
        verbosity_mode=verbosity_mode,
        max_workers=max_workers,
        places_use_primary_category_only=places_use_primary_category_only,
        places_minimal_confidence=places_minimal_confidence,
    )

convert_geometry_to_wide_form_parquet_for_all_types

convert_geometry_to_wide_form_parquet_for_all_types(
    geometry_filter: BaseGeometry,
    release: Optional[str] = None,
    *,
    include_all_possible_columns: bool = True,
    hierarchy_depth: Optional[Union[int, list[Optional[int]]]] = None,
    pyarrow_filters: Optional[list[Optional[PYARROW_FILTER]]] = None,
    result_file_path: Optional[Union[str, Path]] = None,
    ignore_cache: bool = False,
    working_directory: Union[str, Path] = "files",
    verbosity_mode: VERBOSITY_MODE = "transient",
    max_workers: Optional[int] = None,
    places_use_primary_category_only: bool = False,
    places_minimal_confidence: float = 0.75
) -> Path

Get GeoParquet file for a given geometry in a wide format for all types.

Automatically downloads Overture Maps dataset for a given release and all available theme/types in a concurrent manner and returns a single file as a result with multiple columns based on dataset schema.

PARAMETER DESCRIPTION
geometry_filter

Geometry used to filter data.

TYPE: BaseGeometry

release

Release version. If not provided, will automatically load newest available release version. Defaults to None.

TYPE: Optional[str] DEFAULT: None

include_all_possible_columns

Whether to have always the same list of columns in the resulting file. This ensures that always the same set of columns is returned for a given release for different regions. This also means, that some columns might be all filled with a False value. Defaults to True.

TYPE: bool DEFAULT: True

hierarchy_depth

Depth used to calculate how many hierarchy columns should be used to generate the wide form of the data. Can be a single integer or a list of integers. If None, will use all available columns. Defaults to None.

TYPE: Optional[Union[int, list[Optional[int]]]] DEFAULT: None

pyarrow_filters

A list of pyarrow expressions used to filter specific theme type pair. Must be the same length as the list of theme type pairs. Defaults to None.

TYPE: Optional[list[Optional[PYARROW_FILTER]]] DEFAULT: None

result_file_path

Where to save the geoparquet file. If not provided, will be generated based on hashes from filters. Defaults to None.

TYPE: Union[str, Path] DEFAULT: None

ignore_cache

Whether to ignore precalculated geoparquet files or not. Defaults to False.

TYPE: bool DEFAULT: False

working_directory

Directory where to save the downloaded *.parquet files. Defaults to "files".

TYPE: Union[str, Path] DEFAULT: 'files'

verbosity_mode

Set progress verbosity mode. Can be one of: silent, transient and verbose. Silent disables output completely. Transient tracks progress, but removes output after finished. Verbose leaves all progress outputs in the stdout. Defaults to "transient".

TYPE: Literal[silent, transient, verbose] DEFAULT: 'transient'

max_workers

Max number of multiprocessing workers used to process the dataset. Defaults to None.

TYPE: Optional[int] DEFAULT: None

places_use_primary_category_only

Whether to use only primary category from the places dataset. Defaults to False.

TYPE: bool DEFAULT: False

places_minimal_confidence

Minimal confidence level for the places dataset. Defaults to 0.75.

TYPE: float DEFAULT: 0.75

RETURNS DESCRIPTION
Path

Path to the generated GeoParquet file.

TYPE: Path

Source code in overturemaestro/advanced_functions/wide_form.py
def convert_geometry_to_wide_form_parquet_for_all_types(
    geometry_filter: BaseGeometry,
    release: Optional[str] = None,
    *,
    include_all_possible_columns: bool = True,
    hierarchy_depth: Optional[Union[int, list[Optional[int]]]] = None,
    pyarrow_filters: Optional[list[Optional[PYARROW_FILTER]]] = None,
    result_file_path: Optional[Union[str, Path]] = None,
    ignore_cache: bool = False,
    working_directory: Union[str, Path] = "files",
    verbosity_mode: VERBOSITY_MODE = "transient",
    max_workers: Optional[int] = None,
    places_use_primary_category_only: bool = False,
    places_minimal_confidence: float = 0.75,
) -> Path:
    """
    Get GeoParquet file for a given geometry in a wide format for all types.

    Automatically downloads Overture Maps dataset for a given release and all available theme/types
    in a concurrent manner and returns a single file as a result with multiple columns based
    on dataset schema.

    Args:
        geometry_filter (BaseGeometry): Geometry used to filter data.
        release (Optional[str], optional): Release version. If not provided, will automatically load
            newest available release version. Defaults to None.
        include_all_possible_columns (bool, optional): Whether to have always the same list of
            columns in the resulting file. This ensures that always the same set of columns is
            returned for a given release for different regions. This also means, that some columns
            might be all filled with a False value. Defaults to True.
        hierarchy_depth (Optional[Union[int, list[Optional[int]]]], optional): Depth used to
            calculate how many hierarchy columns should be used to generate the wide form of
            the data. Can be a single integer or a list of integers. If None, will use all
            available columns. Defaults to None.
        pyarrow_filters (Optional[list[Optional[PYARROW_FILTER]]], optional): A list of pyarrow
            expressions used to filter specific theme type pair. Must be the same length as the list
            of theme type pairs. Defaults to None.
        result_file_path (Union[str, Path], optional): Where to save
            the geoparquet file. If not provided, will be generated based on hashes
            from filters. Defaults to None.
        ignore_cache (bool, optional): Whether to ignore precalculated geoparquet files or not.
            Defaults to False.
        working_directory (Union[str, Path], optional): Directory where to save
            the downloaded `*.parquet` files. Defaults to "files".
        verbosity_mode (Literal["silent", "transient", "verbose"], optional): Set progress
            verbosity mode. Can be one of: silent, transient and verbose. Silent disables
            output completely. Transient tracks progress, but removes output after finished.
            Verbose leaves all progress outputs in the stdout. Defaults to "transient".
        max_workers (Optional[int], optional): Max number of multiprocessing workers used to
            process the dataset. Defaults to None.
        places_use_primary_category_only (bool, optional): Whether to use only primary category
            from the places dataset. Defaults to False.
        places_minimal_confidence (float, optional): Minimal confidence level for the places
            dataset. Defaults to 0.75.

    Returns:
        Path: Path to the generated GeoParquet file.
    """
    if not release:
        release = get_newest_release_version()

    return convert_geometry_to_wide_form_parquet_for_multiple_types(
        theme_type_pairs=list(get_theme_type_classification(release=release).keys()),
        geometry_filter=geometry_filter,
        release=release,
        include_all_possible_columns=include_all_possible_columns,
        hierarchy_depth=hierarchy_depth,
        pyarrow_filters=pyarrow_filters,
        result_file_path=result_file_path,
        ignore_cache=ignore_cache,
        working_directory=working_directory,
        verbosity_mode=verbosity_mode,
        max_workers=max_workers,
        places_use_primary_category_only=places_use_primary_category_only,
        places_minimal_confidence=places_minimal_confidence,
    )

convert_geometry_to_wide_form_parquet_for_multiple_types

convert_geometry_to_wide_form_parquet_for_multiple_types(
    theme_type_pairs: list[tuple[str, str]],
    geometry_filter: BaseGeometry,
    release: Optional[str] = None,
    *,
    include_all_possible_columns: bool = True,
    hierarchy_depth: Optional[Union[int, list[Optional[int]]]] = None,
    pyarrow_filters: Optional[list[Optional[PYARROW_FILTER]]] = None,
    result_file_path: Optional[Union[str, Path]] = None,
    ignore_cache: bool = False,
    working_directory: Union[str, Path] = "files",
    verbosity_mode: VERBOSITY_MODE = "transient",
    max_workers: Optional[int] = None,
    places_use_primary_category_only: bool = False,
    places_minimal_confidence: float = 0.75
) -> Path

Get GeoParquet file for a given geometry in a wide format for multiple types.

Automatically downloads Overture Maps dataset for a given release and theme/type pairs in a concurrent manner and returns a single file as a result with multiple columns based on dataset schema.

PARAMETER DESCRIPTION
theme_type_pairs

Pairs of themes and types of the dataset.

TYPE: list[tuple[str, str]]

geometry_filter

Geometry used to filter data.

TYPE: BaseGeometry

release

Release version. If not provided, will automatically load newest available release version. Defaults to None.

TYPE: Optional[str] DEFAULT: None

include_all_possible_columns

Whether to have always the same list of columns in the resulting file. This ensures that always the same set of columns is returned for a given release for different regions. This also means, that some columns might be all filled with a False value. Defaults to True.

TYPE: bool DEFAULT: True

hierarchy_depth

Depth used to calculate how many hierarchy columns should be used to generate the wide form of the data. Can be a single integer or a list of integers. If None, will use all available columns. Defaults to None.

TYPE: Optional[Union[int, list[Optional[int]]]] DEFAULT: None

pyarrow_filters

A list of pyarrow expressions used to filter specific theme type pair. Must be the same length as the list of theme type pairs. Defaults to None.

TYPE: Optional[list[Optional[PYARROW_FILTER]]] DEFAULT: None

result_file_path

Where to save the geoparquet file. If not provided, will be generated based on hashes from filters. Defaults to None.

TYPE: Union[str, Path] DEFAULT: None

ignore_cache

Whether to ignore precalculated geoparquet files or not. Defaults to False.

TYPE: bool DEFAULT: False

working_directory

Directory where to save the downloaded *.parquet files. Defaults to "files".

TYPE: Union[str, Path] DEFAULT: 'files'

verbosity_mode

Set progress verbosity mode. Can be one of: silent, transient and verbose. Silent disables output completely. Transient tracks progress, but removes output after finished. Verbose leaves all progress outputs in the stdout. Defaults to "transient".

TYPE: Literal[silent, transient, verbose] DEFAULT: 'transient'

max_workers

Max number of multiprocessing workers used to process the dataset. Defaults to None.

TYPE: Optional[int] DEFAULT: None

places_use_primary_category_only

Whether to use only primary category from the places dataset. Defaults to False.

TYPE: bool DEFAULT: False

places_minimal_confidence

Minimal confidence level for the places dataset. Defaults to 0.75.

TYPE: float DEFAULT: 0.75

RETURNS DESCRIPTION
Path

Path to the generated GeoParquet file.

TYPE: Path

Source code in overturemaestro/advanced_functions/wide_form.py
@show_total_elapsed_time_decorator
def convert_geometry_to_wide_form_parquet_for_multiple_types(
    theme_type_pairs: list[tuple[str, str]],
    geometry_filter: BaseGeometry,
    release: Optional[str] = None,
    *,
    include_all_possible_columns: bool = True,
    hierarchy_depth: Optional[Union[int, list[Optional[int]]]] = None,
    pyarrow_filters: Optional[list[Optional[PYARROW_FILTER]]] = None,
    result_file_path: Optional[Union[str, Path]] = None,
    ignore_cache: bool = False,
    working_directory: Union[str, Path] = "files",
    verbosity_mode: VERBOSITY_MODE = "transient",
    max_workers: Optional[int] = None,
    places_use_primary_category_only: bool = False,
    places_minimal_confidence: float = 0.75,
) -> Path:
    """
    Get GeoParquet file for a given geometry in a wide format for multiple types.

    Automatically downloads Overture Maps dataset for a given release and theme/type pairs
    in a concurrent manner and returns a single file as a result with multiple columns based
    on dataset schema.

    Args:
        theme_type_pairs (list[tuple[str, str]]): Pairs of themes and types of the dataset.
        geometry_filter (BaseGeometry): Geometry used to filter data.
        release (Optional[str], optional): Release version. If not provided, will automatically load
            newest available release version. Defaults to None.
        include_all_possible_columns (bool, optional): Whether to have always the same list of
            columns in the resulting file. This ensures that always the same set of columns is
            returned for a given release for different regions. This also means, that some columns
            might be all filled with a False value. Defaults to True.
        hierarchy_depth (Optional[Union[int, list[Optional[int]]]], optional): Depth used to
            calculate how many hierarchy columns should be used to generate the wide form of
            the data. Can be a single integer or a list of integers. If None, will use all
            available columns. Defaults to None.
        pyarrow_filters (Optional[list[Optional[PYARROW_FILTER]]], optional): A list of pyarrow
            expressions used to filter specific theme type pair. Must be the same length as the list
            of theme type pairs. Defaults to None.
        result_file_path (Union[str, Path], optional): Where to save
            the geoparquet file. If not provided, will be generated based on hashes
            from filters. Defaults to None.
        ignore_cache (bool, optional): Whether to ignore precalculated geoparquet files or not.
            Defaults to False.
        working_directory (Union[str, Path], optional): Directory where to save
            the downloaded `*.parquet` files. Defaults to "files".
        verbosity_mode (Literal["silent", "transient", "verbose"], optional): Set progress
            verbosity mode. Can be one of: silent, transient and verbose. Silent disables
            output completely. Transient tracks progress, but removes output after finished.
            Verbose leaves all progress outputs in the stdout. Defaults to "transient".
        max_workers (Optional[int], optional): Max number of multiprocessing workers used to
            process the dataset. Defaults to None.
        places_use_primary_category_only (bool, optional): Whether to use only primary category
            from the places dataset. Defaults to False.
        places_minimal_confidence (float, optional): Minimal confidence level for the places
            dataset. Defaults to 0.75.

    Returns:
        Path: Path to the generated GeoParquet file.
    """
    if pyarrow_filters is not None and len(theme_type_pairs) != len(pyarrow_filters):
        raise ValueError("Pyarrow filters length doesn't match length of theme type pairs.")

    if isinstance(hierarchy_depth, list) and len(theme_type_pairs) != len(hierarchy_depth):
        raise ValueError("Hierarchy depth list length doesn't match length of theme type pairs.")

    if not release:
        release = get_newest_release_version()

    pyarrow_filters_list = []
    for idx in range(len(theme_type_pairs)):
        _pyarrow_filter = pyarrow_filters[idx] if pyarrow_filters else None

        if _pyarrow_filter is not None:
            from pyarrow.parquet import filters_to_expression

            _pyarrow_filter = filters_to_expression(_pyarrow_filter)

        pyarrow_filters_list.append(_pyarrow_filter)

    if result_file_path is None:
        result_file_path = working_directory / _generate_result_file_path(
            release=release,
            theme_type_pairs=theme_type_pairs,
            geometry_filter=geometry_filter,
            include_all_possible_columns=include_all_possible_columns,
            hierarchy_depth=hierarchy_depth,
            pyarrow_filters=pyarrow_filters_list,
            places_use_primary_category_only=places_use_primary_category_only,
            places_minimal_confidence=places_minimal_confidence,
        )

    result_file_path = Path(result_file_path)

    if not result_file_path.exists() or ignore_cache:
        result_file_path.parent.mkdir(exist_ok=True, parents=True)

        prepared_download_parameters = _prepare_download_parameters_for_all_theme_type_pairs(
            release=release,
            theme_type_pairs=theme_type_pairs,
            geometry_filter=geometry_filter,
            hierarchy_depth=hierarchy_depth,
            pyarrow_filters=pyarrow_filters_list,
            verbosity_mode=verbosity_mode,
            places_minimal_confidence=places_minimal_confidence,
        )

        hierachy_columns_list, columns_to_download_list, pyarrow_filter_list = zip(
            *prepared_download_parameters
        )

        downloaded_parquet_files = download_data_for_multiple_types(
            release=release,
            theme_type_pairs=theme_type_pairs,
            geometry_filter=geometry_filter,
            pyarrow_filters=pyarrow_filter_list,
            columns_to_download=[
                ["id", "geometry", *columns_to_download]
                for columns_to_download in columns_to_download_list
            ],
            ignore_cache=ignore_cache,
            working_directory=working_directory,
            verbosity_mode=verbosity_mode,
            max_workers=max_workers,
        )

        with tempfile.TemporaryDirectory(dir=Path(working_directory).resolve()) as tmp_dir_name:
            tmp_dir_path = Path(tmp_dir_name)

            transformed_wide_form_directory_output = tmp_dir_path / "wide_form_files"
            transformed_wide_form_directory_output.mkdir(parents=True, exist_ok=True)

            with TrackProgressBar(verbosity_mode=verbosity_mode) as progress:
                for (
                    (theme_value, type_value),
                    hierachy_columns,
                    downloaded_parquet_file,
                ) in progress.track(
                    zip(theme_type_pairs, hierachy_columns_list, downloaded_parquet_files),
                    total=len(theme_type_pairs),
                    description="Transforming data into wide form",
                ):
                    wide_form_definition = get_theme_type_classification(release=release)[
                        (theme_value, type_value)
                    ]

                    output_path = (
                        transformed_wide_form_directory_output
                        / f"{theme_value}_{type_value}.parquet"
                    )
                    if len(theme_type_pairs) == 1:
                        output_path = result_file_path

                    if not hierachy_columns:
                        _transform_to_wide_form_without_hierarchy(
                            theme=theme_value,
                            type=type_value,
                            parquet_file=downloaded_parquet_file,
                            output_path=output_path,
                            working_directory=tmp_dir_path,
                        )
                    else:
                        wide_form_definition.data_transform_function(
                            theme=theme_value,
                            type=type_value,
                            release_version=release,
                            parquet_file=downloaded_parquet_file,
                            output_path=output_path,
                            include_all_possible_columns=include_all_possible_columns,
                            hierarchy_columns=hierachy_columns,
                            working_directory=tmp_dir_path,
                            verbosity_mode=verbosity_mode,
                            places_use_primary_category_only=places_use_primary_category_only,
                        )

            if len(theme_type_pairs) > 1:
                with TrackProgressSpinner(
                    "Joining results to a single file", verbosity_mode=verbosity_mode
                ):
                    _combine_multiple_wide_form_files(
                        theme_type_pairs=theme_type_pairs,
                        transformed_wide_form_directory=transformed_wide_form_directory_output,
                        output_path=result_file_path,
                        working_directory=tmp_dir_path,
                    )

    return result_file_path