Advanced functions¶

overturemaestro.advanced_functions ¶

Advanced functions.

This module contains dedicated functions for specific use cases.

convert_bounding_box_to_wide_form_geodataframe ¶

convert_bounding_box_to_wide_form_geodataframe(
    theme: str,
    type: str,
    bbox: tuple[float, float, float, float],
    release: Optional[str] = None,
    *,
    include_all_possible_columns: bool = True,
    hierarchy_depth: Optional[int] = None,
    pyarrow_filters: Optional[PYARROW_FILTER] = None,
    compression: str = PARQUET_COMPRESSION,
    compression_level: int = PARQUET_COMPRESSION_LEVEL,
    row_group_size: int = PARQUET_ROW_GROUP_SIZE,
    ignore_cache: bool = False,
    working_directory: Union[str, Path] = "files",
    verbosity_mode: VERBOSITY_MODE = "transient",
    max_workers: Optional[int] = None,
    sort_result: bool = True,
    places_use_primary_category_only: bool = False,
    places_minimal_confidence: float = 0.75
) -> gpd.GeoDataFrame

Get GeoDataFrame for a given bounding box in a wide format.

Automatically downloads Overture Maps dataset for a given release and theme/type in a concurrent manner and returns a single GeoDataFrame as a result with multiple columns based on dataset schema.

PARAMETER	DESCRIPTION
`theme`	Theme of the dataset. TYPE: `str`
`type`	Type of the dataset. TYPE: `str`
`bbox`	Bounding box used to filter data. Order of values: xmin, ymin, xmax, ymax. TYPE: `tuple[float, float, float, float]`
`release`	Release version. If not provided, will automatically load newest available release version. Defaults to None. TYPE: `Optional[str]` DEFAULT: `None`
`include_all_possible_columns`	Whether to have always the same list of columns in the resulting file. This ensures that always the same set of columns is returned for a given release for different regions. This also means, that some columns might be all filled with a False value. Defaults to True. TYPE: `bool` DEFAULT: `True`
`hierarchy_depth`	Depth used to calculate how many hierarchy columns should be used to generate the wide form of the data. If None, will use all available columns. Defaults to None. TYPE: `Optional[int]` DEFAULT: `None`
`pyarrow_filters`	A pyarrow expression used to filter specific theme type pair. Defaults to None. TYPE: `Optional[PYARROW_FILTER]` DEFAULT: `None`
`compression`	Compression of the final parquet file. Check https://duckdb.org/docs/sql/statements/copy#parquet-options for more info. Remember to change compression level together with this parameter. Defaults to "zstd". TYPE: `str` DEFAULT: `PARQUET_COMPRESSION`
`compression_level`	Compression level of the final parquet file. Check https://duckdb.org/docs/sql/statements/copy#parquet-options for more info. Defaults to 3. TYPE: `int` DEFAULT: `PARQUET_COMPRESSION_LEVEL`
`row_group_size`	Approximate number of rows per row group in the final parquet file. Defaults to 100_000. TYPE: `int` DEFAULT: `PARQUET_ROW_GROUP_SIZE`
`ignore_cache`	Whether to ignore precalculated geoparquet files or not. Defaults to False. TYPE: `bool` DEFAULT: `False`
`working_directory`	Directory where to save the downloaded `.parquet` files. Defaults to "files". TYPE:* `Union[str, Path]` DEFAULT: `'files'`
`verbosity_mode`	Set progress verbosity mode. Can be one of: silent, transient and verbose. Silent disables output completely. Transient tracks progress, but removes output after finished. Verbose leaves all progress outputs in the stdout. Defaults to "transient". TYPE: `Literal[silent, transient, verbose]` DEFAULT: `'transient'`
`max_workers`	Max number of multiprocessing workers used to process the dataset. Defaults to None. TYPE: `Optional[int]` DEFAULT: `None`
`sort_result`	Whether to sort the result by geometry or not. Defaults to True. TYPE: `bool` DEFAULT: `True`
`places_use_primary_category_only`	Whether to use only the primary category for places. Defaults to False. TYPE: `bool` DEFAULT: `False`
`places_minimal_confidence`	Minimal confidence level for the places dataset. Defaults to 0.75. TYPE: `float` DEFAULT: `0.75`

RETURNS	DESCRIPTION
`GeoDataFrame`	gpd.GeoDataFrame: GeoDataFrame with Overture Maps features.

Source code in overturemaestro/advanced_functions/functions.py

def convert_bounding_box_to_wide_form_geodataframe(
    theme: str,
    type: str,
    bbox: tuple[float, float, float, float],
    release: Optional[str] = None,
    *,
    include_all_possible_columns: bool = True,
    hierarchy_depth: Optional[int] = None,
    pyarrow_filters: Optional[PYARROW_FILTER] = None,
    compression: str = PARQUET_COMPRESSION,
    compression_level: int = PARQUET_COMPRESSION_LEVEL,
    row_group_size: int = PARQUET_ROW_GROUP_SIZE,
    ignore_cache: bool = False,
    working_directory: Union[str, Path] = "files",
    verbosity_mode: VERBOSITY_MODE = "transient",
    max_workers: Optional[int] = None,
    sort_result: bool = True,
    places_use_primary_category_only: bool = False,
    places_minimal_confidence: float = 0.75,
) -> gpd.GeoDataFrame:
    """
    Get GeoDataFrame for a given bounding box in a wide format.

    Automatically downloads Overture Maps dataset for a given release and theme/type
    in a concurrent manner and returns a single GeoDataFrame as a result with multiple columns based
    on dataset schema.

    Args:
        theme (str): Theme of the dataset.
        type (str): Type of the dataset.
        bbox (tuple[float, float, float, float]): Bounding box used to filter data.
            Order of values: xmin, ymin, xmax, ymax.
        release (Optional[str], optional): Release version. If not provided, will automatically load
            newest available release version. Defaults to None.
        include_all_possible_columns (bool, optional): Whether to have always the same list of
            columns in the resulting file. This ensures that always the same set of columns is
            returned for a given release for different regions. This also means, that some columns
            might be all filled with a False value. Defaults to True.
        hierarchy_depth (Optional[int], optional): Depth used to calculate how many hierarchy
            columns should be used to generate the wide form of the data. If None, will use all
            available columns. Defaults to None.
        pyarrow_filters (Optional[PYARROW_FILTER], optional): A pyarrow expression used to filter
            specific theme type pair. Defaults to None.
        compression (str, optional): Compression of the final parquet file.
            Check https://duckdb.org/docs/sql/statements/copy#parquet-options for more info.
            Remember to change compression level together with this parameter.
            Defaults to "zstd".
        compression_level (int, optional): Compression level of the final parquet file.
            Check https://duckdb.org/docs/sql/statements/copy#parquet-options for more info.
            Defaults to 3.
        row_group_size (int, optional): Approximate number of rows per row group in the final
            parquet file. Defaults to 100_000.
        ignore_cache (bool, optional): Whether to ignore precalculated geoparquet files or not.
            Defaults to False.
        working_directory (Union[str, Path], optional): Directory where to save
            the downloaded `*.parquet` files. Defaults to "files".
        verbosity_mode (Literal["silent", "transient", "verbose"], optional): Set progress
            verbosity mode. Can be one of: silent, transient and verbose. Silent disables
            output completely. Transient tracks progress, but removes output after finished.
            Verbose leaves all progress outputs in the stdout. Defaults to "transient".
        max_workers (Optional[int], optional): Max number of multiprocessing workers used to
            process the dataset. Defaults to None.
        sort_result (bool, optional): Whether to sort the result by geometry or not.
            Defaults to True.
        places_use_primary_category_only (bool, optional): Whether to use only the primary category
            for places. Defaults to False.
        places_minimal_confidence (float, optional): Minimal confidence level for the places
            dataset. Defaults to 0.75.

    Returns:
        gpd.GeoDataFrame: GeoDataFrame with Overture Maps features.
    """
    return convert_geometry_to_wide_form_geodataframe(
        theme=theme,
        type=type,
        geometry_filter=box(*bbox),
        release=release,
        include_all_possible_columns=include_all_possible_columns,
        hierarchy_depth=hierarchy_depth,
        pyarrow_filters=pyarrow_filters,
        compression=compression,
        compression_level=compression_level,
        row_group_size=row_group_size,
        ignore_cache=ignore_cache,
        working_directory=working_directory,
        verbosity_mode=verbosity_mode,
        max_workers=max_workers,
        sort_result=sort_result,
        places_use_primary_category_only=places_use_primary_category_only,
        places_minimal_confidence=places_minimal_confidence,
    )

convert_bounding_box_to_wide_form_geodataframe_for_all_types ¶

convert_bounding_box_to_wide_form_geodataframe_for_all_types(
    bbox: tuple[float, float, float, float],
    release: Optional[str] = None,
    *,
    include_all_possible_columns: bool = True,
    hierarchy_depth: Optional[Union[int, list[Optional[int]]]] = None,
    pyarrow_filters: Optional[list[Optional[PYARROW_FILTER]]] = None,
    compression: str = PARQUET_COMPRESSION,
    compression_level: int = PARQUET_COMPRESSION_LEVEL,
    row_group_size: int = PARQUET_ROW_GROUP_SIZE,
    ignore_cache: bool = False,
    working_directory: Union[str, Path] = "files",
    verbosity_mode: VERBOSITY_MODE = "transient",
    max_workers: Optional[int] = None,
    sort_result: bool = True,
    places_use_primary_category_only: bool = False,
    places_minimal_confidence: float = 0.75
) -> gpd.GeoDataFrame

Get GeoDataFrame for a given bounding box in a wide format for all types.

Automatically downloads Overture Maps dataset for a given release and all available theme/types in a concurrent manner and returns a single GeoDataFrame as a result with multiple columns based on dataset schema.

PARAMETER	DESCRIPTION
`bbox`	Bounding box used to filter data. Order of values: xmin, ymin, xmax, ymax. TYPE: `tuple[float, float, float, float]`
`release`	Release version. If not provided, will automatically load newest available release version. Defaults to None. TYPE: `Optional[str]` DEFAULT: `None`
`include_all_possible_columns`	Whether to have always the same list of columns in the resulting file. This ensures that always the same set of columns is returned for a given release for different regions. This also means, that some columns might be all filled with a False value. Defaults to True. TYPE: `bool` DEFAULT: `True`
`hierarchy_depth`	Depth used to calculate how many hierarchy columns should be used to generate the wide form of the data. Can be a single integer or a list of integers. If None, will use all available columns. Defaults to None. TYPE: `Optional[Union[int, list[Optional[int]]]]` DEFAULT: `None`
`pyarrow_filters`	A list of pyarrow expressions used to filter specific theme type pair. Must be the same length as the list of theme type pairs. Defaults to None. TYPE: `Optional[list[Optional[PYARROW_FILTER]]]` DEFAULT: `None`
`compression`	Compression of the final parquet file. Check https://duckdb.org/docs/sql/statements/copy#parquet-options for more info. Remember to change compression level together with this parameter. Defaults to "zstd". TYPE: `str` DEFAULT: `PARQUET_COMPRESSION`
`compression_level`	Compression level of the final parquet file. Check https://duckdb.org/docs/sql/statements/copy#parquet-options for more info. Defaults to 3. TYPE: `int` DEFAULT: `PARQUET_COMPRESSION_LEVEL`
`row_group_size`	Approximate number of rows per row group in the final parquet file. Defaults to 100_000. TYPE: `int` DEFAULT: `PARQUET_ROW_GROUP_SIZE`
`ignore_cache`	Whether to ignore precalculated geoparquet files or not. Defaults to False. TYPE: `bool` DEFAULT: `False`
`working_directory`	Directory where to save the downloaded `.parquet` files. Defaults to "files". TYPE:* `Union[str, Path]` DEFAULT: `'files'`
`verbosity_mode`	Set progress verbosity mode. Can be one of: silent, transient and verbose. Silent disables output completely. Transient tracks progress, but removes output after finished. Verbose leaves all progress outputs in the stdout. Defaults to "transient". TYPE: `Literal[silent, transient, verbose]` DEFAULT: `'transient'`
`max_workers`	Max number of multiprocessing workers used to process the dataset. Defaults to None. TYPE: `Optional[int]` DEFAULT: `None`
`sort_result`	Whether to sort the result by geometry or not. Defaults to True. TYPE: `bool` DEFAULT: `True`
`places_use_primary_category_only`	Whether to use only the primary category for places. Defaults to False. TYPE: `bool` DEFAULT: `False`
`places_minimal_confidence`	Minimal confidence level for the places dataset. Defaults to 0.75. TYPE: `float` DEFAULT: `0.75`

RETURNS	DESCRIPTION
`GeoDataFrame`	gpd.GeoDataFrame: GeoDataFrame with Overture Maps features.

Source code in overturemaestro/advanced_functions/functions.py

def convert_bounding_box_to_wide_form_geodataframe_for_all_types(
    bbox: tuple[float, float, float, float],
    release: Optional[str] = None,
    *,
    include_all_possible_columns: bool = True,
    hierarchy_depth: Optional[Union[int, list[Optional[int]]]] = None,
    pyarrow_filters: Optional[list[Optional[PYARROW_FILTER]]] = None,
    compression: str = PARQUET_COMPRESSION,
    compression_level: int = PARQUET_COMPRESSION_LEVEL,
    row_group_size: int = PARQUET_ROW_GROUP_SIZE,
    ignore_cache: bool = False,
    working_directory: Union[str, Path] = "files",
    verbosity_mode: VERBOSITY_MODE = "transient",
    max_workers: Optional[int] = None,
    sort_result: bool = True,
    places_use_primary_category_only: bool = False,
    places_minimal_confidence: float = 0.75,
) -> gpd.GeoDataFrame:
    """
    Get GeoDataFrame for a given bounding box in a wide format for all types.

    Automatically downloads Overture Maps dataset for a given release and all available theme/types
    in a concurrent manner and returns a single GeoDataFrame as a result with multiple columns based
    on dataset schema.

    Args:
        bbox (tuple[float, float, float, float]): Bounding box used to filter data.
            Order of values: xmin, ymin, xmax, ymax.
        release (Optional[str], optional): Release version. If not provided, will automatically load
            newest available release version. Defaults to None.
        include_all_possible_columns (bool, optional): Whether to have always the same list of
            columns in the resulting file. This ensures that always the same set of columns is
            returned for a given release for different regions. This also means, that some columns
            might be all filled with a False value. Defaults to True.
        hierarchy_depth (Optional[Union[int, list[Optional[int]]]], optional): Depth used to
            calculate how many hierarchy columns should be used to generate the wide form of
            the data. Can be a single integer or a list of integers. If None, will use all
            available columns. Defaults to None.
        pyarrow_filters (Optional[list[Optional[PYARROW_FILTER]]], optional): A list of pyarrow
            expressions used to filter specific theme type pair. Must be the same length as the list
            of theme type pairs. Defaults to None.
        compression (str, optional): Compression of the final parquet file.
            Check https://duckdb.org/docs/sql/statements/copy#parquet-options for more info.
            Remember to change compression level together with this parameter.
            Defaults to "zstd".
        compression_level (int, optional): Compression level of the final parquet file.
            Check https://duckdb.org/docs/sql/statements/copy#parquet-options for more info.
            Defaults to 3.
        row_group_size (int, optional): Approximate number of rows per row group in the final
            parquet file. Defaults to 100_000.
        ignore_cache (bool, optional): Whether to ignore precalculated geoparquet files or not.
            Defaults to False.
        working_directory (Union[str, Path], optional): Directory where to save
            the downloaded `*.parquet` files. Defaults to "files".
        verbosity_mode (Literal["silent", "transient", "verbose"], optional): Set progress
            verbosity mode. Can be one of: silent, transient and verbose. Silent disables
            output completely. Transient tracks progress, but removes output after finished.
            Verbose leaves all progress outputs in the stdout. Defaults to "transient".
        max_workers (Optional[int], optional): Max number of multiprocessing workers used to
            process the dataset. Defaults to None.
        sort_result (bool, optional): Whether to sort the result by geometry or not.
            Defaults to True.
        places_use_primary_category_only (bool, optional): Whether to use only the primary category
            for places. Defaults to False.
        places_minimal_confidence (float, optional): Minimal confidence level for the places
            dataset. Defaults to 0.75.

    Returns:
        gpd.GeoDataFrame: GeoDataFrame with Overture Maps features.
    """
    return convert_geometry_to_wide_form_geodataframe_for_all_types(
        geometry_filter=box(*bbox),
        release=release,
        include_all_possible_columns=include_all_possible_columns,
        hierarchy_depth=hierarchy_depth,
        pyarrow_filters=pyarrow_filters,
        compression=compression,
        compression_level=compression_level,
        row_group_size=row_group_size,
        ignore_cache=ignore_cache,
        working_directory=working_directory,
        verbosity_mode=verbosity_mode,
        max_workers=max_workers,
        sort_result=sort_result,
        places_use_primary_category_only=places_use_primary_category_only,
        places_minimal_confidence=places_minimal_confidence,
    )

convert_bounding_box_to_wide_form_geodataframe_for_multiple_types ¶

convert_bounding_box_to_wide_form_geodataframe_for_multiple_types(
    theme_type_pairs: list[tuple[str, str]],
    bbox: tuple[float, float, float, float],
    release: Optional[str] = None,
    *,
    include_all_possible_columns: bool = True,
    hierarchy_depth: Optional[Union[int, list[Optional[int]]]] = None,
    pyarrow_filters: Optional[list[Optional[PYARROW_FILTER]]] = None,
    compression: str = PARQUET_COMPRESSION,
    compression_level: int = PARQUET_COMPRESSION_LEVEL,
    row_group_size: int = PARQUET_ROW_GROUP_SIZE,
    ignore_cache: bool = False,
    working_directory: Union[str, Path] = "files",
    verbosity_mode: VERBOSITY_MODE = "transient",
    max_workers: Optional[int] = None,
    sort_result: bool = True,
    places_use_primary_category_only: bool = False,
    places_minimal_confidence: float = 0.75
) -> gpd.GeoDataFrame

Get GeoDataFrame for a given bounding box in a wide format for multiple types.

Automatically downloads Overture Maps dataset for a given release and theme/type pairs in a concurrent manner and returns a single GeoDataFrame as a result with multiple columns based on dataset schema.

PARAMETER	DESCRIPTION
`theme_type_pairs`	Pairs of themes and types of the dataset. TYPE: `list[tuple[str, str]]`
`bbox`	Bounding box used to filter data. Order of values: xmin, ymin, xmax, ymax. TYPE: `tuple[float, float, float, float]`
`release`	Release version. If not provided, will automatically load newest available release version. Defaults to None. TYPE: `Optional[str]` DEFAULT: `None`
`include_all_possible_columns`	Whether to have always the same list of columns in the resulting file. This ensures that always the same set of columns is returned for a given release for different regions. This also means, that some columns might be all filled with a False value. Defaults to True. TYPE: `bool` DEFAULT: `True`
`hierarchy_depth`	Depth used to calculate how many hierarchy columns should be used to generate the wide form of the data. Can be a single integer or a list of integers. If None, will use all available columns. Defaults to None. TYPE: `Optional[Union[int, list[Optional[int]]]]` DEFAULT: `None`
`pyarrow_filters`	A list of pyarrow expressions used to filter specific theme type pair. Must be the same length as the list of theme type pairs. Defaults to None. TYPE: `Optional[list[Optional[PYARROW_FILTER]]]` DEFAULT: `None`
`compression`	Compression of the final parquet file. Check https://duckdb.org/docs/sql/statements/copy#parquet-options for more info. Remember to change compression level together with this parameter. Defaults to "zstd". TYPE: `str` DEFAULT: `PARQUET_COMPRESSION`
`compression_level`	Compression level of the final parquet file. Check https://duckdb.org/docs/sql/statements/copy#parquet-options for more info. Defaults to 3. TYPE: `int` DEFAULT: `PARQUET_COMPRESSION_LEVEL`
`row_group_size`	Approximate number of rows per row group in the final parquet file. Defaults to 100_000. TYPE: `int` DEFAULT: `PARQUET_ROW_GROUP_SIZE`
`ignore_cache`	Whether to ignore precalculated geoparquet files or not. Defaults to False. TYPE: `bool` DEFAULT: `False`
`working_directory`	Directory where to save the downloaded `.parquet` files. Defaults to "files". TYPE:* `Union[str, Path]` DEFAULT: `'files'`
`verbosity_mode`	Set progress verbosity mode. Can be one of: silent, transient and verbose. Silent disables output completely. Transient tracks progress, but removes output after finished. Verbose leaves all progress outputs in the stdout. Defaults to "transient". TYPE: `Literal[silent, transient, verbose]` DEFAULT: `'transient'`
`max_workers`	Max number of multiprocessing workers used to process the dataset. Defaults to None. TYPE: `Optional[int]` DEFAULT: `None`
`sort_result`	Whether to sort the result by geometry or not. Defaults to True. TYPE: `bool` DEFAULT: `True`
`places_use_primary_category_only`	Whether to use only the primary category for places. Defaults to False. TYPE: `bool` DEFAULT: `False`
`places_minimal_confidence`	Minimal confidence level for the places dataset. Defaults to 0.75. TYPE: `float` DEFAULT: `0.75`

RETURNS	DESCRIPTION
`GeoDataFrame`	gpd.GeoDataFrame: GeoDataFrame with Overture Maps features.

Source code in overturemaestro/advanced_functions/functions.py

def convert_bounding_box_to_wide_form_geodataframe_for_multiple_types(
    theme_type_pairs: list[tuple[str, str]],
    bbox: tuple[float, float, float, float],
    release: Optional[str] = None,
    *,
    include_all_possible_columns: bool = True,
    hierarchy_depth: Optional[Union[int, list[Optional[int]]]] = None,
    pyarrow_filters: Optional[list[Optional[PYARROW_FILTER]]] = None,
    compression: str = PARQUET_COMPRESSION,
    compression_level: int = PARQUET_COMPRESSION_LEVEL,
    row_group_size: int = PARQUET_ROW_GROUP_SIZE,
    ignore_cache: bool = False,
    working_directory: Union[str, Path] = "files",
    verbosity_mode: VERBOSITY_MODE = "transient",
    max_workers: Optional[int] = None,
    sort_result: bool = True,
    places_use_primary_category_only: bool = False,
    places_minimal_confidence: float = 0.75,
) -> gpd.GeoDataFrame:
    """
    Get GeoDataFrame for a given bounding box in a wide format for multiple types.

    Automatically downloads Overture Maps dataset for a given release and theme/type pairs
    in a concurrent manner and returns a single GeoDataFrame as a result with multiple columns based
    on dataset schema.

    Args:
        theme_type_pairs (list[tuple[str, str]]): Pairs of themes and types of the dataset.
        bbox (tuple[float, float, float, float]): Bounding box used to filter data.
            Order of values: xmin, ymin, xmax, ymax.
        release (Optional[str], optional): Release version. If not provided, will automatically load
            newest available release version. Defaults to None.
        include_all_possible_columns (bool, optional): Whether to have always the same list of
            columns in the resulting file. This ensures that always the same set of columns is
            returned for a given release for different regions. This also means, that some columns
            might be all filled with a False value. Defaults to True.
        hierarchy_depth (Optional[Union[int, list[Optional[int]]]], optional): Depth used to
            calculate how many hierarchy columns should be used to generate the wide form of
            the data. Can be a single integer or a list of integers. If None, will use all
            available columns. Defaults to None.
        pyarrow_filters (Optional[list[Optional[PYARROW_FILTER]]], optional): A list of pyarrow
            expressions used to filter specific theme type pair. Must be the same length as the list
            of theme type pairs. Defaults to None.
        compression (str, optional): Compression of the final parquet file.
            Check https://duckdb.org/docs/sql/statements/copy#parquet-options for more info.
            Remember to change compression level together with this parameter.
            Defaults to "zstd".
        compression_level (int, optional): Compression level of the final parquet file.
            Check https://duckdb.org/docs/sql/statements/copy#parquet-options for more info.
            Defaults to 3.
        row_group_size (int, optional): Approximate number of rows per row group in the final
            parquet file. Defaults to 100_000.
        ignore_cache (bool, optional): Whether to ignore precalculated geoparquet files or not.
            Defaults to False.
        working_directory (Union[str, Path], optional): Directory where to save
            the downloaded `*.parquet` files. Defaults to "files".
        verbosity_mode (Literal["silent", "transient", "verbose"], optional): Set progress
            verbosity mode. Can be one of: silent, transient and verbose. Silent disables
            output completely. Transient tracks progress, but removes output after finished.
            Verbose leaves all progress outputs in the stdout. Defaults to "transient".
        max_workers (Optional[int], optional): Max number of multiprocessing workers used to
            process the dataset. Defaults to None.
        sort_result (bool, optional): Whether to sort the result by geometry or not.
            Defaults to True.
        places_use_primary_category_only (bool, optional): Whether to use only the primary category
            for places. Defaults to False.
        places_minimal_confidence (float, optional): Minimal confidence level for the places
            dataset. Defaults to 0.75.

    Returns:
        gpd.GeoDataFrame: GeoDataFrame with Overture Maps features.
    """
    return convert_geometry_to_wide_form_geodataframe_for_multiple_types(
        theme_type_pairs=theme_type_pairs,
        geometry_filter=box(*bbox),
        release=release,
        include_all_possible_columns=include_all_possible_columns,
        hierarchy_depth=hierarchy_depth,
        pyarrow_filters=pyarrow_filters,
        compression=compression,
        compression_level=compression_level,
        row_group_size=row_group_size,
        ignore_cache=ignore_cache,
        working_directory=working_directory,
        verbosity_mode=verbosity_mode,
        max_workers=max_workers,
        sort_result=sort_result,
        places_use_primary_category_only=places_use_primary_category_only,
        places_minimal_confidence=places_minimal_confidence,
    )

convert_bounding_box_to_wide_form_parquet ¶

convert_bounding_box_to_wide_form_parquet(
    theme: str,
    type: str,
    bbox: tuple[float, float, float, float],
    release: Optional[str] = None,
    *,
    include_all_possible_columns: bool = True,
    hierarchy_depth: Optional[int] = None,
    pyarrow_filters: Optional[PYARROW_FILTER] = None,
    compression: str = PARQUET_COMPRESSION,
    compression_level: int = PARQUET_COMPRESSION_LEVEL,
    row_group_size: int = PARQUET_ROW_GROUP_SIZE,
    result_file_path: Optional[Union[str, Path]] = None,
    ignore_cache: bool = False,
    working_directory: Union[str, Path] = "files",
    verbosity_mode: VERBOSITY_MODE = "transient",
    max_workers: Optional[int] = None,
    sort_result: bool = True,
    places_use_primary_category_only: bool = False,
    places_minimal_confidence: float = 0.75
) -> Path

Get GeoParquet file for a given bounding box in a wide format.

Automatically downloads Overture Maps dataset for a given release and theme/type in a concurrent manner and returns a single file as a result with multiple columns based on dataset schema.

PARAMETER	DESCRIPTION
`theme`	Theme of the dataset. TYPE: `str`
`type`	Type of the dataset. TYPE: `str`
`bbox`	Bounding box used to filter data. Order of values: xmin, ymin, xmax, ymax. TYPE: `tuple[float, float, float, float]`
`release`	Release version. If not provided, will automatically load newest available release version. Defaults to None. TYPE: `Optional[str]` DEFAULT: `None`
`include_all_possible_columns`	Whether to have always the same list of columns in the resulting file. This ensures that always the same set of columns is returned for a given release for different regions. This also means, that some columns might be all filled with a False value. Defaults to True. TYPE: `bool` DEFAULT: `True`
`hierarchy_depth`	Depth used to calculate how many hierarchy columns should be used to generate the wide form of the data. If None, will use all available columns. Defaults to None. TYPE: `Optional[int]` DEFAULT: `None`
`pyarrow_filters`	A pyarrow expression used to filter specific theme type pair. Defaults to None. TYPE: `Optional[PYARROW_FILTER]` DEFAULT: `None`
`compression`	Compression of the final parquet file. Check https://duckdb.org/docs/sql/statements/copy#parquet-options for more info. Remember to change compression level together with this parameter. Defaults to "zstd". TYPE: `str` DEFAULT: `PARQUET_COMPRESSION`
`compression_level`	Compression level of the final parquet file. Check https://duckdb.org/docs/sql/statements/copy#parquet-options for more info. Defaults to 3. TYPE: `int` DEFAULT: `PARQUET_COMPRESSION_LEVEL`
`row_group_size`	Approximate number of rows per row group in the final parquet file. Defaults to 100_000. TYPE: `int` DEFAULT: `PARQUET_ROW_GROUP_SIZE`
`result_file_path`	Where to save the geoparquet file. If not provided, will be generated based on hashes from filters. Defaults to None. TYPE: `Union[str, Path]` DEFAULT: `None`
`ignore_cache`	Whether to ignore precalculated geoparquet files or not. Defaults to False. TYPE: `bool` DEFAULT: `False`
`working_directory`	Directory where to save the downloaded `.parquet` files. Defaults to "files". TYPE:* `Union[str, Path]` DEFAULT: `'files'`
`verbosity_mode`	Set progress verbosity mode. Can be one of: silent, transient and verbose. Silent disables output completely. Transient tracks progress, but removes output after finished. Verbose leaves all progress outputs in the stdout. Defaults to "transient". TYPE: `Literal[silent, transient, verbose]` DEFAULT: `'transient'`
`max_workers`	Max number of multiprocessing workers used to process the dataset. Defaults to None. TYPE: `Optional[int]` DEFAULT: `None`
`sort_result`	Whether to sort the result by geometry or not. Defaults to True. TYPE: `bool` DEFAULT: `True`
`places_use_primary_category_only`	Whether to use only the primary category for places. Defaults to False. TYPE: `bool` DEFAULT: `False`
`places_minimal_confidence`	Minimal confidence level for the places dataset. Defaults to 0.75. TYPE: `float` DEFAULT: `0.75`

RETURNS	DESCRIPTION
`Path`	Path to the generated GeoParquet file. TYPE: `Path`

Source code in overturemaestro/advanced_functions/functions.py

def convert_bounding_box_to_wide_form_parquet(
    theme: str,
    type: str,
    bbox: tuple[float, float, float, float],
    release: Optional[str] = None,
    *,
    include_all_possible_columns: bool = True,
    hierarchy_depth: Optional[int] = None,
    pyarrow_filters: Optional[PYARROW_FILTER] = None,
    compression: str = PARQUET_COMPRESSION,
    compression_level: int = PARQUET_COMPRESSION_LEVEL,
    row_group_size: int = PARQUET_ROW_GROUP_SIZE,
    result_file_path: Optional[Union[str, Path]] = None,
    ignore_cache: bool = False,
    working_directory: Union[str, Path] = "files",
    verbosity_mode: VERBOSITY_MODE = "transient",
    max_workers: Optional[int] = None,
    sort_result: bool = True,
    places_use_primary_category_only: bool = False,
    places_minimal_confidence: float = 0.75,
) -> Path:
    """
    Get GeoParquet file for a given bounding box in a wide format.

    Automatically downloads Overture Maps dataset for a given release and theme/type
    in a concurrent manner and returns a single file as a result with multiple columns based
    on dataset schema.

    Args:
        theme (str): Theme of the dataset.
        type (str): Type of the dataset.
        bbox (tuple[float, float, float, float]): Bounding box used to filter data.
            Order of values: xmin, ymin, xmax, ymax.
        release (Optional[str], optional): Release version. If not provided, will automatically load
            newest available release version. Defaults to None.
        include_all_possible_columns (bool, optional): Whether to have always the same list of
            columns in the resulting file. This ensures that always the same set of columns is
            returned for a given release for different regions. This also means, that some columns
            might be all filled with a False value. Defaults to True.
        hierarchy_depth (Optional[int], optional): Depth used to calculate how many hierarchy
            columns should be used to generate the wide form of the data. If None, will use all
            available columns. Defaults to None.
        pyarrow_filters (Optional[PYARROW_FILTER], optional): A pyarrow
            expression used to filter specific theme type pair. Defaults to None.
        compression (str, optional): Compression of the final parquet file.
            Check https://duckdb.org/docs/sql/statements/copy#parquet-options for more info.
            Remember to change compression level together with this parameter.
            Defaults to "zstd".
        compression_level (int, optional): Compression level of the final parquet file.
            Check https://duckdb.org/docs/sql/statements/copy#parquet-options for more info.
            Defaults to 3.
        row_group_size (int, optional): Approximate number of rows per row group in the final
            parquet file. Defaults to 100_000.
        result_file_path (Union[str, Path], optional): Where to save
            the geoparquet file. If not provided, will be generated based on hashes
            from filters. Defaults to None.
        ignore_cache (bool, optional): Whether to ignore precalculated geoparquet files or not.
            Defaults to False.
        working_directory (Union[str, Path], optional): Directory where to save
            the downloaded `*.parquet` files. Defaults to "files".
        verbosity_mode (Literal["silent", "transient", "verbose"], optional): Set progress
            verbosity mode. Can be one of: silent, transient and verbose. Silent disables
            output completely. Transient tracks progress, but removes output after finished.
            Verbose leaves all progress outputs in the stdout. Defaults to "transient".
        max_workers (Optional[int], optional): Max number of multiprocessing workers used to
            process the dataset. Defaults to None.
        sort_result (bool, optional): Whether to sort the result by geometry or not.
            Defaults to True.
        places_use_primary_category_only (bool, optional): Whether to use only the primary category
            for places. Defaults to False.
        places_minimal_confidence (float, optional): Minimal confidence level for the places
            dataset. Defaults to 0.75.

    Returns:
        Path: Path to the generated GeoParquet file.
    """
    return convert_geometry_to_wide_form_parquet(
        theme=theme,
        type=type,
        geometry_filter=box(*bbox),
        release=release,
        include_all_possible_columns=include_all_possible_columns,
        hierarchy_depth=hierarchy_depth,
        pyarrow_filters=pyarrow_filters,
        compression=compression,
        compression_level=compression_level,
        row_group_size=row_group_size,
        result_file_path=result_file_path,
        ignore_cache=ignore_cache,
        working_directory=working_directory,
        verbosity_mode=verbosity_mode,
        max_workers=max_workers,
        sort_result=sort_result,
        places_use_primary_category_only=places_use_primary_category_only,
        places_minimal_confidence=places_minimal_confidence,
    )

convert_bounding_box_to_wide_form_parquet_for_all_types ¶

convert_bounding_box_to_wide_form_parquet_for_all_types(
    bbox: tuple[float, float, float, float],
    release: Optional[str] = None,
    *,
    include_all_possible_columns: bool = True,
    hierarchy_depth: Optional[Union[int, list[Optional[int]]]] = None,
    pyarrow_filters: Optional[list[Optional[PYARROW_FILTER]]] = None,
    compression: str = PARQUET_COMPRESSION,
    compression_level: int = PARQUET_COMPRESSION_LEVEL,
    row_group_size: int = PARQUET_ROW_GROUP_SIZE,
    result_file_path: Optional[Union[str, Path]] = None,
    ignore_cache: bool = False,
    working_directory: Union[str, Path] = "files",
    verbosity_mode: VERBOSITY_MODE = "transient",
    max_workers: Optional[int] = None,
    sort_result: bool = True,
    places_use_primary_category_only: bool = False,
    places_minimal_confidence: float = 0.75
) -> Path

Get GeoParquet file for a given bounding box in a wide format for all types.

Automatically downloads Overture Maps dataset for a given release and all available theme/types in a concurrent manner and returns a single file as a result with multiple columns based on dataset schema.

PARAMETER	DESCRIPTION
`bbox`	Bounding box used to filter data. Order of values: xmin, ymin, xmax, ymax. TYPE: `tuple[float, float, float, float]`
`release`	Release version. If not provided, will automatically load newest available release version. Defaults to None. TYPE: `Optional[str]` DEFAULT: `None`
`include_all_possible_columns`	Whether to have always the same list of columns in the resulting file. This ensures that always the same set of columns is returned for a given release for different regions. This also means, that some columns might be all filled with a False value. Defaults to True. TYPE: `bool` DEFAULT: `True`
`hierarchy_depth`	Depth used to calculate how many hierarchy columns should be used to generate the wide form of the data. Can be a single integer or a list of integers. If None, will use all available columns. Defaults to None. TYPE: `Optional[Union[int, list[Optional[int]]]]` DEFAULT: `None`
`pyarrow_filters`	A list of pyarrow expressions used to filter specific theme type pair. Must be the same length as the list of theme type pairs. Defaults to None. TYPE: `Optional[list[Optional[PYARROW_FILTER]]]` DEFAULT: `None`
`compression`	Compression of the final parquet file. Check https://duckdb.org/docs/sql/statements/copy#parquet-options for more info. Remember to change compression level together with this parameter. Defaults to "zstd". TYPE: `str` DEFAULT: `PARQUET_COMPRESSION`
`compression_level`	Compression level of the final parquet file. Check https://duckdb.org/docs/sql/statements/copy#parquet-options for more info. Defaults to 3. TYPE: `int` DEFAULT: `PARQUET_COMPRESSION_LEVEL`
`row_group_size`	Approximate number of rows per row group in the final parquet file. Defaults to 100_000. TYPE: `int` DEFAULT: `PARQUET_ROW_GROUP_SIZE`
`result_file_path`	Where to save the geoparquet file. If not provided, will be generated based on hashes from filters. Defaults to None. TYPE: `Union[str, Path]` DEFAULT: `None`
`ignore_cache`	Whether to ignore precalculated geoparquet files or not. Defaults to False. TYPE: `bool` DEFAULT: `False`
`working_directory`	Directory where to save the downloaded `.parquet` files. Defaults to "files". TYPE:* `Union[str, Path]` DEFAULT: `'files'`
`verbosity_mode`	Set progress verbosity mode. Can be one of: silent, transient and verbose. Silent disables output completely. Transient tracks progress, but removes output after finished. Verbose leaves all progress outputs in the stdout. Defaults to "transient". TYPE: `Literal[silent, transient, verbose]` DEFAULT: `'transient'`
`max_workers`	Max number of multiprocessing workers used to process the dataset. Defaults to None. TYPE: `Optional[int]` DEFAULT: `None`
`sort_result`	Whether to sort the result by geometry or not. Defaults to True. TYPE: `bool` DEFAULT: `True`
`places_use_primary_category_only`	Whether to use only the primary category for places. Defaults to False. TYPE: `bool` DEFAULT: `False`
`places_minimal_confidence`	Minimal confidence level for the places dataset. Defaults to 0.75. TYPE: `float` DEFAULT: `0.75`

RETURNS	DESCRIPTION
`Path`	Path to the generated GeoParquet file. TYPE: `Path`

Source code in overturemaestro/advanced_functions/functions.py

def convert_bounding_box_to_wide_form_parquet_for_all_types(
    bbox: tuple[float, float, float, float],
    release: Optional[str] = None,
    *,
    include_all_possible_columns: bool = True,
    hierarchy_depth: Optional[Union[int, list[Optional[int]]]] = None,
    pyarrow_filters: Optional[list[Optional[PYARROW_FILTER]]] = None,
    compression: str = PARQUET_COMPRESSION,
    compression_level: int = PARQUET_COMPRESSION_LEVEL,
    row_group_size: int = PARQUET_ROW_GROUP_SIZE,
    result_file_path: Optional[Union[str, Path]] = None,
    ignore_cache: bool = False,
    working_directory: Union[str, Path] = "files",
    verbosity_mode: VERBOSITY_MODE = "transient",
    max_workers: Optional[int] = None,
    sort_result: bool = True,
    places_use_primary_category_only: bool = False,
    places_minimal_confidence: float = 0.75,
) -> Path:
    """
    Get GeoParquet file for a given bounding box in a wide format for all types.

    Automatically downloads Overture Maps dataset for a given release and all available theme/types
    in a concurrent manner and returns a single file as a result with multiple columns based
    on dataset schema.

    Args:
        bbox (tuple[float, float, float, float]): Bounding box used to filter data.
            Order of values: xmin, ymin, xmax, ymax.
        release (Optional[str], optional): Release version. If not provided, will automatically load
            newest available release version. Defaults to None.
        include_all_possible_columns (bool, optional): Whether to have always the same list of
            columns in the resulting file. This ensures that always the same set of columns is
            returned for a given release for different regions. This also means, that some columns
            might be all filled with a False value. Defaults to True.
        hierarchy_depth (Optional[Union[int, list[Optional[int]]]], optional): Depth used to
            calculate how many hierarchy columns should be used to generate the wide form of
            the data. Can be a single integer or a list of integers. If None, will use all
            available columns. Defaults to None.
        pyarrow_filters (Optional[list[Optional[PYARROW_FILTER]]], optional): A list of pyarrow
            expressions used to filter specific theme type pair. Must be the same length as the list
            of theme type pairs. Defaults to None.
        compression (str, optional): Compression of the final parquet file.
            Check https://duckdb.org/docs/sql/statements/copy#parquet-options for more info.
            Remember to change compression level together with this parameter.
            Defaults to "zstd".
        compression_level (int, optional): Compression level of the final parquet file.
            Check https://duckdb.org/docs/sql/statements/copy#parquet-options for more info.
            Defaults to 3.
        row_group_size (int, optional): Approximate number of rows per row group in the final
            parquet file. Defaults to 100_000.
        result_file_path (Union[str, Path], optional): Where to save
            the geoparquet file. If not provided, will be generated based on hashes
            from filters. Defaults to None.
        ignore_cache (bool, optional): Whether to ignore precalculated geoparquet files or not.
            Defaults to False.
        working_directory (Union[str, Path], optional): Directory where to save
            the downloaded `*.parquet` files. Defaults to "files".
        verbosity_mode (Literal["silent", "transient", "verbose"], optional): Set progress
            verbosity mode. Can be one of: silent, transient and verbose. Silent disables
            output completely. Transient tracks progress, but removes output after finished.
            Verbose leaves all progress outputs in the stdout. Defaults to "transient".
        max_workers (Optional[int], optional): Max number of multiprocessing workers used to
            process the dataset. Defaults to None.
        sort_result (bool, optional): Whether to sort the result by geometry or not.
            Defaults to True.
        places_use_primary_category_only (bool, optional): Whether to use only the primary category
            for places. Defaults to False.
        places_minimal_confidence (float, optional): Minimal confidence level for the places
            dataset. Defaults to 0.75.

    Returns:
        Path: Path to the generated GeoParquet file.
    """
    return convert_geometry_to_wide_form_parquet_for_all_types(
        geometry_filter=box(*bbox),
        release=release,
        include_all_possible_columns=include_all_possible_columns,
        hierarchy_depth=hierarchy_depth,
        pyarrow_filters=pyarrow_filters,
        compression=compression,
        compression_level=compression_level,
        row_group_size=row_group_size,
        result_file_path=result_file_path,
        ignore_cache=ignore_cache,
        working_directory=working_directory,
        verbosity_mode=verbosity_mode,
        max_workers=max_workers,
        sort_result=sort_result,
        places_use_primary_category_only=places_use_primary_category_only,
        places_minimal_confidence=places_minimal_confidence,
    )

convert_bounding_box_to_wide_form_parquet_for_multiple_types ¶

convert_bounding_box_to_wide_form_parquet_for_multiple_types(
    theme_type_pairs: list[tuple[str, str]],
    bbox: tuple[float, float, float, float],
    release: Optional[str] = None,
    *,
    include_all_possible_columns: bool = True,
    hierarchy_depth: Optional[Union[int, list[Optional[int]]]] = None,
    pyarrow_filters: Optional[list[Optional[PYARROW_FILTER]]] = None,
    compression: str = PARQUET_COMPRESSION,
    compression_level: int = PARQUET_COMPRESSION_LEVEL,
    row_group_size: int = PARQUET_ROW_GROUP_SIZE,
    result_file_path: Optional[Union[str, Path]] = None,
    ignore_cache: bool = False,
    working_directory: Union[str, Path] = "files",
    verbosity_mode: VERBOSITY_MODE = "transient",
    max_workers: Optional[int] = None,
    sort_result: bool = True,
    places_use_primary_category_only: bool = False,
    places_minimal_confidence: float = 0.75
) -> Path

Get GeoParquet file for a given bounding box in a wide format for multiple types.

Automatically downloads Overture Maps dataset for a given release and theme/type pairs in a concurrent manner and returns a single file as a result with multiple columns based on dataset schema.

PARAMETER	DESCRIPTION
`theme_type_pairs`	Pairs of themes and types of the dataset. TYPE: `list[tuple[str, str]]`
`bbox`	Bounding box used to filter data. Order of values: xmin, ymin, xmax, ymax. TYPE: `tuple[float, float, float, float]`
`release`	Release version. If not provided, will automatically load newest available release version. Defaults to None. TYPE: `Optional[str]` DEFAULT: `None`
`include_all_possible_columns`	Whether to have always the same list of columns in the resulting file. This ensures that always the same set of columns is returned for a given release for different regions. This also means, that some columns might be all filled with a False value. Defaults to True. TYPE: `bool` DEFAULT: `True`
`hierarchy_depth`	Depth used to calculate how many hierarchy columns should be used to generate the wide form of the data. Can be a single integer or a list of integers. If None, will use all available columns. Defaults to None. TYPE: `Optional[Union[int, list[Optional[int]]]]` DEFAULT: `None`
`pyarrow_filters`	A list of pyarrow expressions used to filter specific theme type pair. Must be the same length as the list of theme type pairs. Defaults to None. TYPE: `Optional[list[Optional[PYARROW_FILTER]]]` DEFAULT: `None`
`compression`	Compression of the final parquet file. Check https://duckdb.org/docs/sql/statements/copy#parquet-options for more info. Remember to change compression level together with this parameter. Defaults to "zstd". TYPE: `str` DEFAULT: `PARQUET_COMPRESSION`
`compression_level`	Compression level of the final parquet file. Check https://duckdb.org/docs/sql/statements/copy#parquet-options for more info. Defaults to 3. TYPE: `int` DEFAULT: `PARQUET_COMPRESSION_LEVEL`
`row_group_size`	Approximate number of rows per row group in the final parquet file. Defaults to 100_000. TYPE: `int` DEFAULT: `PARQUET_ROW_GROUP_SIZE`
`result_file_path`	Where to save the geoparquet file. If not provided, will be generated based on hashes from filters. Defaults to None. TYPE: `Union[str, Path]` DEFAULT: `None`
`ignore_cache`	Whether to ignore precalculated geoparquet files or not. Defaults to False. TYPE: `bool` DEFAULT: `False`
`working_directory`	Directory where to save the downloaded `.parquet` files. Defaults to "files". TYPE:* `Union[str, Path]` DEFAULT: `'files'`
`verbosity_mode`	Set progress verbosity mode. Can be one of: silent, transient and verbose. Silent disables output completely. Transient tracks progress, but removes output after finished. Verbose leaves all progress outputs in the stdout. Defaults to "transient". TYPE: `Literal[silent, transient, verbose]` DEFAULT: `'transient'`
`max_workers`	Max number of multiprocessing workers used to process the dataset. Defaults to None. TYPE: `Optional[int]` DEFAULT: `None`
`sort_result`	Whether to sort the result by geometry or not. Defaults to True. TYPE: `bool` DEFAULT: `True`
`places_use_primary_category_only`	Whether to use only the primary category for places. Defaults to False. TYPE: `bool` DEFAULT: `False`
`places_minimal_confidence`	Minimal confidence level for the places dataset. Defaults to 0.75. TYPE: `float` DEFAULT: `0.75`

RETURNS	DESCRIPTION
`Path`	Path to the generated GeoParquet file. TYPE: `Path`

Source code in overturemaestro/advanced_functions/functions.py

def convert_bounding_box_to_wide_form_parquet_for_multiple_types(
    theme_type_pairs: list[tuple[str, str]],
    bbox: tuple[float, float, float, float],
    release: Optional[str] = None,
    *,
    include_all_possible_columns: bool = True,
    hierarchy_depth: Optional[Union[int, list[Optional[int]]]] = None,
    pyarrow_filters: Optional[list[Optional[PYARROW_FILTER]]] = None,
    compression: str = PARQUET_COMPRESSION,
    compression_level: int = PARQUET_COMPRESSION_LEVEL,
    row_group_size: int = PARQUET_ROW_GROUP_SIZE,
    result_file_path: Optional[Union[str, Path]] = None,
    ignore_cache: bool = False,
    working_directory: Union[str, Path] = "files",
    verbosity_mode: VERBOSITY_MODE = "transient",
    max_workers: Optional[int] = None,
    sort_result: bool = True,
    places_use_primary_category_only: bool = False,
    places_minimal_confidence: float = 0.75,
) -> Path:
    """
    Get GeoParquet file for a given bounding box in a wide format for multiple types.

    Automatically downloads Overture Maps dataset for a given release and theme/type pairs
    in a concurrent manner and returns a single file as a result with multiple columns based
    on dataset schema.

    Args:
        theme_type_pairs (list[tuple[str, str]]): Pairs of themes and types of the dataset.
        bbox (tuple[float, float, float, float]): Bounding box used to filter data.
            Order of values: xmin, ymin, xmax, ymax.
        release (Optional[str], optional): Release version. If not provided, will automatically load
            newest available release version. Defaults to None.
        include_all_possible_columns (bool, optional): Whether to have always the same list of
            columns in the resulting file. This ensures that always the same set of columns is
            returned for a given release for different regions. This also means, that some columns
            might be all filled with a False value. Defaults to True.
        hierarchy_depth (Optional[Union[int, list[Optional[int]]]], optional): Depth used to
            calculate how many hierarchy columns should be used to generate the wide form of
            the data. Can be a single integer or a list of integers. If None, will use all
            available columns. Defaults to None.
        pyarrow_filters (Optional[list[Optional[PYARROW_FILTER]]], optional): A list of pyarrow
            expressions used to filter specific theme type pair. Must be the same length as the list
            of theme type pairs. Defaults to None.
        compression (str, optional): Compression of the final parquet file.
            Check https://duckdb.org/docs/sql/statements/copy#parquet-options for more info.
            Remember to change compression level together with this parameter.
            Defaults to "zstd".
        compression_level (int, optional): Compression level of the final parquet file.
            Check https://duckdb.org/docs/sql/statements/copy#parquet-options for more info.
            Defaults to 3.
        row_group_size (int, optional): Approximate number of rows per row group in the final
            parquet file. Defaults to 100_000.
        result_file_path (Union[str, Path], optional): Where to save
            the geoparquet file. If not provided, will be generated based on hashes
            from filters. Defaults to None.
        ignore_cache (bool, optional): Whether to ignore precalculated geoparquet files or not.
            Defaults to False.
        working_directory (Union[str, Path], optional): Directory where to save
            the downloaded `*.parquet` files. Defaults to "files".
        verbosity_mode (Literal["silent", "transient", "verbose"], optional): Set progress
            verbosity mode. Can be one of: silent, transient and verbose. Silent disables
            output completely. Transient tracks progress, but removes output after finished.
            Verbose leaves all progress outputs in the stdout. Defaults to "transient".
        max_workers (Optional[int], optional): Max number of multiprocessing workers used to
            process the dataset. Defaults to None.
        sort_result (bool, optional): Whether to sort the result by geometry or not.
            Defaults to True.
        places_use_primary_category_only (bool, optional): Whether to use only the primary category
            for places. Defaults to False.
        places_minimal_confidence (float, optional): Minimal confidence level for the places
            dataset. Defaults to 0.75.

    Returns:
        Path: Path to the generated GeoParquet file.
    """
    return convert_geometry_to_wide_form_parquet_for_multiple_types(
        theme_type_pairs=theme_type_pairs,
        geometry_filter=box(*bbox),
        release=release,
        include_all_possible_columns=include_all_possible_columns,
        hierarchy_depth=hierarchy_depth,
        pyarrow_filters=pyarrow_filters,
        compression=compression,
        compression_level=compression_level,
        row_group_size=row_group_size,
        result_file_path=result_file_path,
        ignore_cache=ignore_cache,
        working_directory=working_directory,
        verbosity_mode=verbosity_mode,
        max_workers=max_workers,
        sort_result=sort_result,
        places_use_primary_category_only=places_use_primary_category_only,
        places_minimal_confidence=places_minimal_confidence,
    )

convert_geometry_to_wide_form_geodataframe ¶

convert_geometry_to_wide_form_geodataframe(
    theme: str,
    type: str,
    geometry_filter: BaseGeometry,
    release: Optional[str] = None,
    *,
    include_all_possible_columns: bool = True,
    hierarchy_depth: Optional[int] = None,
    pyarrow_filters: Optional[PYARROW_FILTER] = None,
    compression: str = PARQUET_COMPRESSION,
    compression_level: int = PARQUET_COMPRESSION_LEVEL,
    row_group_size: int = PARQUET_ROW_GROUP_SIZE,
    ignore_cache: bool = False,
    working_directory: Union[str, Path] = "files",
    verbosity_mode: VERBOSITY_MODE = "transient",
    max_workers: Optional[int] = None,
    sort_result: bool = True,
    places_use_primary_category_only: bool = False,
    places_minimal_confidence: float = 0.75
) -> gpd.GeoDataFrame

Get GeoDataFrame for a given geometry in a wide format.

Automatically downloads Overture Maps dataset for a given release and theme/type in a concurrent manner and returns a single GeoDataFrame as a result with multiple columns based on dataset schema.

PARAMETER	DESCRIPTION
`theme`	Theme of the dataset. TYPE: `str`
`type`	Type of the dataset. TYPE: `str`
`geometry_filter`	Geometry used to filter data. TYPE: `BaseGeometry`
`release`	Release version. If not provided, will automatically load newest available release version. Defaults to None. TYPE: `Optional[str]` DEFAULT: `None`
`include_all_possible_columns`	Whether to have always the same list of columns in the resulting file. This ensures that always the same set of columns is returned for a given release for different regions. This also means, that some columns might be all filled with a False value. Defaults to True. TYPE: `bool` DEFAULT: `True`
`hierarchy_depth`	Depth used to calculate how many hierarchy columns should be used to generate the wide form of the data. If None, will use all available columns. Defaults to None. TYPE: `Optional[int]` DEFAULT: `None`
`pyarrow_filters`	A pyarrow expression used to filter specific theme type pair. Defaults to None. TYPE: `Optional[PYARROW_FILTER]` DEFAULT: `None`
`compression`	Compression of the final parquet file. Check https://duckdb.org/docs/sql/statements/copy#parquet-options for more info. Remember to change compression level together with this parameter. Defaults to "zstd". TYPE: `str` DEFAULT: `PARQUET_COMPRESSION`
`compression_level`	Compression level of the final parquet file. Check https://duckdb.org/docs/sql/statements/copy#parquet-options for more info. Defaults to 3. TYPE: `int` DEFAULT: `PARQUET_COMPRESSION_LEVEL`
`row_group_size`	Approximate number of rows per row group in the final parquet file. Defaults to 100_000. TYPE: `int` DEFAULT: `PARQUET_ROW_GROUP_SIZE`
`ignore_cache`	Whether to ignore precalculated geoparquet files or not. Defaults to False. TYPE: `bool` DEFAULT: `False`
`working_directory`	Directory where to save the downloaded `.parquet` files. Defaults to "files". TYPE:* `Union[str, Path]` DEFAULT: `'files'`
`verbosity_mode`	Set progress verbosity mode. Can be one of: silent, transient and verbose. Silent disables output completely. Transient tracks progress, but removes output after finished. Verbose leaves all progress outputs in the stdout. Defaults to "transient". TYPE: `Literal[silent, transient, verbose]` DEFAULT: `'transient'`
`max_workers`	Max number of multiprocessing workers used to process the dataset. Defaults to None. TYPE: `Optional[int]` DEFAULT: `None`
`sort_result`	Whether to sort the result by geometry or not. Defaults to True. TYPE: `bool` DEFAULT: `True`
`places_use_primary_category_only`	Whether to use only the primary category for places. Defaults to False. TYPE: `bool` DEFAULT: `False`
`places_minimal_confidence`	Minimal confidence level for the places dataset. Defaults to 0.75. TYPE: `float` DEFAULT: `0.75`

RETURNS	DESCRIPTION
`GeoDataFrame`	gpd.GeoDataFrame: GeoDataFrame with Overture Maps features.

Source code in overturemaestro/advanced_functions/functions.py

def convert_geometry_to_wide_form_geodataframe(
    theme: str,
    type: str,
    geometry_filter: BaseGeometry,
    release: Optional[str] = None,
    *,
    include_all_possible_columns: bool = True,
    hierarchy_depth: Optional[int] = None,
    pyarrow_filters: Optional[PYARROW_FILTER] = None,
    compression: str = PARQUET_COMPRESSION,
    compression_level: int = PARQUET_COMPRESSION_LEVEL,
    row_group_size: int = PARQUET_ROW_GROUP_SIZE,
    ignore_cache: bool = False,
    working_directory: Union[str, Path] = "files",
    verbosity_mode: VERBOSITY_MODE = "transient",
    max_workers: Optional[int] = None,
    sort_result: bool = True,
    places_use_primary_category_only: bool = False,
    places_minimal_confidence: float = 0.75,
) -> gpd.GeoDataFrame:
    """
    Get GeoDataFrame for a given geometry in a wide format.

    Automatically downloads Overture Maps dataset for a given release and theme/type
    in a concurrent manner and returns a single GeoDataFrame as a result with multiple columns based
    on dataset schema.

    Args:
        theme (str): Theme of the dataset.
        type (str): Type of the dataset.
        geometry_filter (BaseGeometry): Geometry used to filter data.
        release (Optional[str], optional): Release version. If not provided, will automatically load
            newest available release version. Defaults to None.
        include_all_possible_columns (bool, optional): Whether to have always the same list of
            columns in the resulting file. This ensures that always the same set of columns is
            returned for a given release for different regions. This also means, that some columns
            might be all filled with a False value. Defaults to True.
        hierarchy_depth (Optional[int], optional): Depth used to calculate how many hierarchy
            columns should be used to generate the wide form of the data. If None, will use all
            available columns. Defaults to None.
        pyarrow_filters (Optional[PYARROW_FILTER], optional): A pyarrow expression used to filter
            specific theme type pair. Defaults to None.
        compression (str, optional): Compression of the final parquet file.
            Check https://duckdb.org/docs/sql/statements/copy#parquet-options for more info.
            Remember to change compression level together with this parameter.
            Defaults to "zstd".
        compression_level (int, optional): Compression level of the final parquet file.
            Check https://duckdb.org/docs/sql/statements/copy#parquet-options for more info.
            Defaults to 3.
        row_group_size (int, optional): Approximate number of rows per row group in the final
            parquet file. Defaults to 100_000.
        ignore_cache (bool, optional): Whether to ignore precalculated geoparquet files or not.
            Defaults to False.
        working_directory (Union[str, Path], optional): Directory where to save
            the downloaded `*.parquet` files. Defaults to "files".
        verbosity_mode (Literal["silent", "transient", "verbose"], optional): Set progress
            verbosity mode. Can be one of: silent, transient and verbose. Silent disables
            output completely. Transient tracks progress, but removes output after finished.
            Verbose leaves all progress outputs in the stdout. Defaults to "transient".
        max_workers (Optional[int], optional): Max number of multiprocessing workers used to
            process the dataset. Defaults to None.
        sort_result (bool, optional): Whether to sort the result by geometry or not.
            Defaults to True.
        places_use_primary_category_only (bool, optional): Whether to use only the primary category
            for places. Defaults to False.
        places_minimal_confidence (float, optional): Minimal confidence level for the places
            dataset. Defaults to 0.75.

    Returns:
        gpd.GeoDataFrame: GeoDataFrame with Overture Maps features.
    """
    parsed_geoparquet_file = convert_geometry_to_wide_form_parquet(
        theme=theme,
        type=type,
        geometry_filter=geometry_filter,
        release=release,
        include_all_possible_columns=include_all_possible_columns,
        hierarchy_depth=hierarchy_depth,
        pyarrow_filters=pyarrow_filters,
        compression=compression,
        compression_level=compression_level,
        row_group_size=row_group_size,
        ignore_cache=ignore_cache,
        working_directory=working_directory,
        verbosity_mode=verbosity_mode,
        max_workers=max_workers,
        sort_result=sort_result,
        places_use_primary_category_only=places_use_primary_category_only,
        places_minimal_confidence=places_minimal_confidence,
    )
    return geodataframe_from_parquet(parsed_geoparquet_file)

convert_geometry_to_wide_form_geodataframe_for_all_types ¶

convert_geometry_to_wide_form_geodataframe_for_all_types(
    geometry_filter: BaseGeometry,
    release: Optional[str] = None,
    *,
    include_all_possible_columns: bool = True,
    hierarchy_depth: Optional[Union[int, list[Optional[int]]]] = None,
    pyarrow_filters: Optional[list[Optional[PYARROW_FILTER]]] = None,
    compression: str = PARQUET_COMPRESSION,
    compression_level: int = PARQUET_COMPRESSION_LEVEL,
    row_group_size: int = PARQUET_ROW_GROUP_SIZE,
    ignore_cache: bool = False,
    working_directory: Union[str, Path] = "files",
    verbosity_mode: VERBOSITY_MODE = "transient",
    max_workers: Optional[int] = None,
    sort_result: bool = True,
    places_use_primary_category_only: bool = False,
    places_minimal_confidence: float = 0.75
) -> gpd.GeoDataFrame

Get GeoDataFrame for a given geometry in a wide format for all types.

Automatically downloads Overture Maps dataset for a given release and all available theme/types in a concurrent manner and returns a single GeoDataFrame as a result with multiple columns based on dataset schema.

PARAMETER	DESCRIPTION
`geometry_filter`	Geometry used to filter data. TYPE: `BaseGeometry`
`release`	Release version. If not provided, will automatically load newest available release version. Defaults to None. TYPE: `Optional[str]` DEFAULT: `None`
`include_all_possible_columns`	Whether to have always the same list of columns in the resulting file. This ensures that always the same set of columns is returned for a given release for different regions. This also means, that some columns might be all filled with a False value. Defaults to True. TYPE: `bool` DEFAULT: `True`
`hierarchy_depth`	Depth used to calculate how many hierarchy columns should be used to generate the wide form of the data. Can be a single integer or a list of integers. If None, will use all available columns. Defaults to None. TYPE: `Optional[Union[int, list[Optional[int]]]]` DEFAULT: `None`
`pyarrow_filters`	A list of pyarrow expressions used to filter specific theme type pair. Must be the same length as the list of theme type pairs. Defaults to None. TYPE: `Optional[list[Optional[PYARROW_FILTER]]]` DEFAULT: `None`
`compression`	Compression of the final parquet file. Check https://duckdb.org/docs/sql/statements/copy#parquet-options for more info. Remember to change compression level together with this parameter. Defaults to "zstd". TYPE: `str` DEFAULT: `PARQUET_COMPRESSION`
`compression_level`	Compression level of the final parquet file. Check https://duckdb.org/docs/sql/statements/copy#parquet-options for more info. Defaults to 3. TYPE: `int` DEFAULT: `PARQUET_COMPRESSION_LEVEL`
`row_group_size`	Approximate number of rows per row group in the final parquet file. Defaults to 100_000. TYPE: `int` DEFAULT: `PARQUET_ROW_GROUP_SIZE`
`ignore_cache`	Whether to ignore precalculated geoparquet files or not. Defaults to False. TYPE: `bool` DEFAULT: `False`
`working_directory`	Directory where to save the downloaded `.parquet` files. Defaults to "files". TYPE:* `Union[str, Path]` DEFAULT: `'files'`
`verbosity_mode`	Set progress verbosity mode. Can be one of: silent, transient and verbose. Silent disables output completely. Transient tracks progress, but removes output after finished. Verbose leaves all progress outputs in the stdout. Defaults to "transient". TYPE: `Literal[silent, transient, verbose]` DEFAULT: `'transient'`
`max_workers`	Max number of multiprocessing workers used to process the dataset. Defaults to None. TYPE: `Optional[int]` DEFAULT: `None`
`sort_result`	Whether to sort the result by geometry or not. Defaults to True. TYPE: `bool` DEFAULT: `True`
`places_use_primary_category_only`	Whether to use only the primary category for places. Defaults to False. TYPE: `bool` DEFAULT: `False`
`places_minimal_confidence`	Minimal confidence level for the places dataset. Defaults to 0.75. TYPE: `float` DEFAULT: `0.75`

RETURNS	DESCRIPTION
`GeoDataFrame`	gpd.GeoDataFrame: GeoDataFrame with Overture Maps features.

Source code in overturemaestro/advanced_functions/functions.py

def convert_geometry_to_wide_form_geodataframe_for_all_types(
    geometry_filter: BaseGeometry,
    release: Optional[str] = None,
    *,
    include_all_possible_columns: bool = True,
    hierarchy_depth: Optional[Union[int, list[Optional[int]]]] = None,
    pyarrow_filters: Optional[list[Optional[PYARROW_FILTER]]] = None,
    compression: str = PARQUET_COMPRESSION,
    compression_level: int = PARQUET_COMPRESSION_LEVEL,
    row_group_size: int = PARQUET_ROW_GROUP_SIZE,
    ignore_cache: bool = False,
    working_directory: Union[str, Path] = "files",
    verbosity_mode: VERBOSITY_MODE = "transient",
    max_workers: Optional[int] = None,
    sort_result: bool = True,
    places_use_primary_category_only: bool = False,
    places_minimal_confidence: float = 0.75,
) -> gpd.GeoDataFrame:
    """
    Get GeoDataFrame for a given geometry in a wide format for all types.

    Automatically downloads Overture Maps dataset for a given release and all available theme/types
    in a concurrent manner and returns a single GeoDataFrame as a result with multiple columns based
    on dataset schema.

    Args:
        geometry_filter (BaseGeometry): Geometry used to filter data.
        release (Optional[str], optional): Release version. If not provided, will automatically load
            newest available release version. Defaults to None.
        include_all_possible_columns (bool, optional): Whether to have always the same list of
            columns in the resulting file. This ensures that always the same set of columns is
            returned for a given release for different regions. This also means, that some columns
            might be all filled with a False value. Defaults to True.
        hierarchy_depth (Optional[Union[int, list[Optional[int]]]], optional): Depth used to
            calculate how many hierarchy columns should be used to generate the wide form of
            the data. Can be a single integer or a list of integers. If None, will use all
            available columns. Defaults to None.
        pyarrow_filters (Optional[list[Optional[PYARROW_FILTER]]], optional): A list of pyarrow
            expressions used to filter specific theme type pair. Must be the same length as the list
            of theme type pairs. Defaults to None.
        compression (str, optional): Compression of the final parquet file.
            Check https://duckdb.org/docs/sql/statements/copy#parquet-options for more info.
            Remember to change compression level together with this parameter.
            Defaults to "zstd".
        compression_level (int, optional): Compression level of the final parquet file.
            Check https://duckdb.org/docs/sql/statements/copy#parquet-options for more info.
            Defaults to 3.
        row_group_size (int, optional): Approximate number of rows per row group in the final
            parquet file. Defaults to 100_000.
        ignore_cache (bool, optional): Whether to ignore precalculated geoparquet files or not.
            Defaults to False.
        working_directory (Union[str, Path], optional): Directory where to save
            the downloaded `*.parquet` files. Defaults to "files".
        verbosity_mode (Literal["silent", "transient", "verbose"], optional): Set progress
            verbosity mode. Can be one of: silent, transient and verbose. Silent disables
            output completely. Transient tracks progress, but removes output after finished.
            Verbose leaves all progress outputs in the stdout. Defaults to "transient".
        max_workers (Optional[int], optional): Max number of multiprocessing workers used to
            process the dataset. Defaults to None.
        sort_result (bool, optional): Whether to sort the result by geometry or not.
            Defaults to True.
        places_use_primary_category_only (bool, optional): Whether to use only the primary category
            for places. Defaults to False.
        places_minimal_confidence (float, optional): Minimal confidence level for the places
            dataset. Defaults to 0.75.

    Returns:
        gpd.GeoDataFrame: GeoDataFrame with Overture Maps features.
    """
    parsed_geoparquet_file = convert_geometry_to_wide_form_parquet_for_all_types(
        geometry_filter=geometry_filter,
        release=release,
        include_all_possible_columns=include_all_possible_columns,
        hierarchy_depth=hierarchy_depth,
        pyarrow_filters=pyarrow_filters,
        compression=compression,
        compression_level=compression_level,
        row_group_size=row_group_size,
        ignore_cache=ignore_cache,
        working_directory=working_directory,
        verbosity_mode=verbosity_mode,
        max_workers=max_workers,
        sort_result=sort_result,
        places_use_primary_category_only=places_use_primary_category_only,
        places_minimal_confidence=places_minimal_confidence,
    )
    return geodataframe_from_parquet(parsed_geoparquet_file)

convert_geometry_to_wide_form_geodataframe_for_multiple_types ¶

convert_geometry_to_wide_form_geodataframe_for_multiple_types(
    theme_type_pairs: list[tuple[str, str]],
    geometry_filter: BaseGeometry,
    release: Optional[str] = None,
    *,
    include_all_possible_columns: bool = True,
    hierarchy_depth: Optional[Union[int, list[Optional[int]]]] = None,
    pyarrow_filters: Optional[list[Optional[PYARROW_FILTER]]] = None,
    compression: str = PARQUET_COMPRESSION,
    compression_level: int = PARQUET_COMPRESSION_LEVEL,
    row_group_size: int = PARQUET_ROW_GROUP_SIZE,
    ignore_cache: bool = False,
    working_directory: Union[str, Path] = "files",
    verbosity_mode: VERBOSITY_MODE = "transient",
    max_workers: Optional[int] = None,
    sort_result: bool = True,
    places_use_primary_category_only: bool = False,
    places_minimal_confidence: float = 0.75
) -> gpd.GeoDataFrame

Get GeoDataFrame for a given geometry in a wide format for multiple types.

Automatically downloads Overture Maps dataset for a given release and theme/type pairs in a concurrent manner and returns a single GeoDataFrame as a result with multiple columns based on dataset schema.

PARAMETER	DESCRIPTION
`theme_type_pairs`	Pairs of themes and types of the dataset. TYPE: `list[tuple[str, str]]`
`geometry_filter`	Geometry used to filter data. TYPE: `BaseGeometry`
`release`	Release version. If not provided, will automatically load newest available release version. Defaults to None. TYPE: `Optional[str]` DEFAULT: `None`
`include_all_possible_columns`	Whether to have always the same list of columns in the resulting file. This ensures that always the same set of columns is returned for a given release for different regions. This also means, that some columns might be all filled with a False value. Defaults to True. TYPE: `bool` DEFAULT: `True`
`hierarchy_depth`	Depth used to calculate how many hierarchy columns should be used to generate the wide form of the data. Can be a single integer or a list of integers. If None, will use all available columns. Defaults to None. TYPE: `Optional[Union[int, list[Optional[int]]]]` DEFAULT: `None`
`pyarrow_filters`	A list of pyarrow expressions used to filter specific theme type pair. Must be the same length as the list of theme type pairs. Defaults to None. TYPE: `Optional[list[Optional[PYARROW_FILTER]]]` DEFAULT: `None`
`compression`	Compression of the final parquet file. Check https://duckdb.org/docs/sql/statements/copy#parquet-options for more info. Remember to change compression level together with this parameter. Defaults to "zstd". TYPE: `str` DEFAULT: `PARQUET_COMPRESSION`
`compression_level`	Compression level of the final parquet file. Check https://duckdb.org/docs/sql/statements/copy#parquet-options for more info. Defaults to 3. TYPE: `int` DEFAULT: `PARQUET_COMPRESSION_LEVEL`
`row_group_size`	Approximate number of rows per row group in the final parquet file. Defaults to 100_000. TYPE: `int` DEFAULT: `PARQUET_ROW_GROUP_SIZE`
`ignore_cache`	Whether to ignore precalculated geoparquet files or not. Defaults to False. TYPE: `bool` DEFAULT: `False`
`working_directory`	Directory where to save the downloaded `.parquet` files. Defaults to "files". TYPE:* `Union[str, Path]` DEFAULT: `'files'`
`verbosity_mode`	Set progress verbosity mode. Can be one of: silent, transient and verbose. Silent disables output completely. Transient tracks progress, but removes output after finished. Verbose leaves all progress outputs in the stdout. Defaults to "transient". TYPE: `Literal[silent, transient, verbose]` DEFAULT: `'transient'`
`max_workers`	Max number of multiprocessing workers used to process the dataset. Defaults to None. TYPE: `Optional[int]` DEFAULT: `None`
`sort_result`	Whether to sort the result by geometry or not. Defaults to True. TYPE: `bool` DEFAULT: `True`
`places_use_primary_category_only`	Whether to use only the primary category for places. Defaults to False. TYPE: `bool` DEFAULT: `False`
`places_minimal_confidence`	Minimal confidence level for the places dataset. Defaults to 0.75. TYPE: `float` DEFAULT: `0.75`

RETURNS	DESCRIPTION
`GeoDataFrame`	gpd.GeoDataFrame: GeoDataFrame with Overture Maps features.

Source code in overturemaestro/advanced_functions/functions.py

def convert_geometry_to_wide_form_geodataframe_for_multiple_types(
    theme_type_pairs: list[tuple[str, str]],
    geometry_filter: BaseGeometry,
    release: Optional[str] = None,
    *,
    include_all_possible_columns: bool = True,
    hierarchy_depth: Optional[Union[int, list[Optional[int]]]] = None,
    pyarrow_filters: Optional[list[Optional[PYARROW_FILTER]]] = None,
    compression: str = PARQUET_COMPRESSION,
    compression_level: int = PARQUET_COMPRESSION_LEVEL,
    row_group_size: int = PARQUET_ROW_GROUP_SIZE,
    ignore_cache: bool = False,
    working_directory: Union[str, Path] = "files",
    verbosity_mode: VERBOSITY_MODE = "transient",
    max_workers: Optional[int] = None,
    sort_result: bool = True,
    places_use_primary_category_only: bool = False,
    places_minimal_confidence: float = 0.75,
) -> gpd.GeoDataFrame:
    """
    Get GeoDataFrame for a given geometry in a wide format for multiple types.

    Automatically downloads Overture Maps dataset for a given release and theme/type pairs
    in a concurrent manner and returns a single GeoDataFrame as a result with multiple columns based
    on dataset schema.

    Args:
        theme_type_pairs (list[tuple[str, str]]): Pairs of themes and types of the dataset.
        geometry_filter (BaseGeometry): Geometry used to filter data.
        release (Optional[str], optional): Release version. If not provided, will automatically load
            newest available release version. Defaults to None.
        include_all_possible_columns (bool, optional): Whether to have always the same list of
            columns in the resulting file. This ensures that always the same set of columns is
            returned for a given release for different regions. This also means, that some columns
            might be all filled with a False value. Defaults to True.
        hierarchy_depth (Optional[Union[int, list[Optional[int]]]], optional): Depth used to
            calculate how many hierarchy columns should be used to generate the wide form of
            the data. Can be a single integer or a list of integers. If None, will use all
            available columns. Defaults to None.
        pyarrow_filters (Optional[list[Optional[PYARROW_FILTER]]], optional): A list of pyarrow
            expressions used to filter specific theme type pair. Must be the same length as the list
            of theme type pairs. Defaults to None.
        compression (str, optional): Compression of the final parquet file.
            Check https://duckdb.org/docs/sql/statements/copy#parquet-options for more info.
            Remember to change compression level together with this parameter.
            Defaults to "zstd".
        compression_level (int, optional): Compression level of the final parquet file.
            Check https://duckdb.org/docs/sql/statements/copy#parquet-options for more info.
            Defaults to 3.
        row_group_size (int, optional): Approximate number of rows per row group in the final
            parquet file. Defaults to 100_000.
        ignore_cache (bool, optional): Whether to ignore precalculated geoparquet files or not.
            Defaults to False.
        working_directory (Union[str, Path], optional): Directory where to save
            the downloaded `*.parquet` files. Defaults to "files".
        verbosity_mode (Literal["silent", "transient", "verbose"], optional): Set progress
            verbosity mode. Can be one of: silent, transient and verbose. Silent disables
            output completely. Transient tracks progress, but removes output after finished.
            Verbose leaves all progress outputs in the stdout. Defaults to "transient".
        max_workers (Optional[int], optional): Max number of multiprocessing workers used to
            process the dataset. Defaults to None.
        sort_result (bool, optional): Whether to sort the result by geometry or not.
            Defaults to True.
        places_use_primary_category_only (bool, optional): Whether to use only the primary category
            for places. Defaults to False.
        places_minimal_confidence (float, optional): Minimal confidence level for the places
            dataset. Defaults to 0.75.

    Returns:
        gpd.GeoDataFrame: GeoDataFrame with Overture Maps features.
    """
    parsed_geoparquet_file = convert_geometry_to_wide_form_parquet_for_multiple_types(
        theme_type_pairs=theme_type_pairs,
        geometry_filter=geometry_filter,
        release=release,
        include_all_possible_columns=include_all_possible_columns,
        hierarchy_depth=hierarchy_depth,
        pyarrow_filters=pyarrow_filters,
        compression=compression,
        compression_level=compression_level,
        row_group_size=row_group_size,
        ignore_cache=ignore_cache,
        working_directory=working_directory,
        verbosity_mode=verbosity_mode,
        max_workers=max_workers,
        sort_result=sort_result,
        places_use_primary_category_only=places_use_primary_category_only,
        places_minimal_confidence=places_minimal_confidence,
    )
    return geodataframe_from_parquet(parsed_geoparquet_file)

convert_geometry_to_wide_form_parquet ¶

convert_geometry_to_wide_form_parquet(
    theme: str,
    type: str,
    geometry_filter: BaseGeometry,
    release: Optional[str] = None,
    *,
    include_all_possible_columns: bool = True,
    hierarchy_depth: Optional[int] = None,
    pyarrow_filters: Optional[PYARROW_FILTER] = None,
    compression: str = PARQUET_COMPRESSION,
    compression_level: int = PARQUET_COMPRESSION_LEVEL,
    row_group_size: int = PARQUET_ROW_GROUP_SIZE,
    result_file_path: Optional[Union[str, Path]] = None,
    ignore_cache: bool = False,
    working_directory: Union[str, Path] = "files",
    verbosity_mode: VERBOSITY_MODE = "transient",
    max_workers: Optional[int] = None,
    sort_result: bool = True,
    places_use_primary_category_only: bool = False,
    places_minimal_confidence: float = 0.75
) -> Path

Get GeoParquet file for a given geometry in a wide format.

Automatically downloads Overture Maps dataset for a given release and theme/type in a concurrent manner and returns a single file as a result with multiple columns based on dataset schema.

PARAMETER	DESCRIPTION
`theme`	Theme of the dataset. TYPE: `str`
`type`	Type of the dataset. TYPE: `str`
`geometry_filter`	Geometry used to filter data. TYPE: `BaseGeometry`
`release`	Release version. If not provided, will automatically load newest available release version. Defaults to None. TYPE: `Optional[str]` DEFAULT: `None`
`include_all_possible_columns`	Whether to have always the same list of columns in the resulting file. This ensures that always the same set of columns is returned for a given release for different regions. This also means, that some columns might be all filled with a False value. Defaults to True. TYPE: `bool` DEFAULT: `True`
`hierarchy_depth`	Depth used to calculate how many hierarchy columns should be used to generate the wide form of the data. If None, will use all available columns. Defaults to None. TYPE: `Optional[int]` DEFAULT: `None`
`pyarrow_filters`	A pyarrow expression used to filter specific theme type pair. Defaults to None. TYPE: `Optional[PYARROW_FILTER]` DEFAULT: `None`
`compression`	Compression of the final parquet file. Check https://duckdb.org/docs/sql/statements/copy#parquet-options for more info. Remember to change compression level together with this parameter. Defaults to "zstd". TYPE: `str` DEFAULT: `PARQUET_COMPRESSION`
`compression_level`	Compression level of the final parquet file. Check https://duckdb.org/docs/sql/statements/copy#parquet-options for more info. Defaults to 3. TYPE: `int` DEFAULT: `PARQUET_COMPRESSION_LEVEL`
`row_group_size`	Approximate number of rows per row group in the final parquet file. Defaults to 100_000. TYPE: `int` DEFAULT: `PARQUET_ROW_GROUP_SIZE`
`result_file_path`	Where to save the geoparquet file. If not provided, will be generated based on hashes from filters. Defaults to None. TYPE: `Union[str, Path]` DEFAULT: `None`
`ignore_cache`	Whether to ignore precalculated geoparquet files or not. Defaults to False. TYPE: `bool` DEFAULT: `False`
`working_directory`	Directory where to save the downloaded `.parquet` files. Defaults to "files". TYPE:* `Union[str, Path]` DEFAULT: `'files'`
`verbosity_mode`	Set progress verbosity mode. Can be one of: silent, transient and verbose. Silent disables output completely. Transient tracks progress, but removes output after finished. Verbose leaves all progress outputs in the stdout. Defaults to "transient". TYPE: `Literal[silent, transient, verbose]` DEFAULT: `'transient'`
`max_workers`	Max number of multiprocessing workers used to process the dataset. Defaults to None. TYPE: `Optional[int]` DEFAULT: `None`
`sort_result`	Whether to sort the result by geometry or not. Defaults to True. TYPE: `bool` DEFAULT: `True`
`places_use_primary_category_only`	Whether to use only the primary category for places. Defaults to False. TYPE: `bool` DEFAULT: `False`
`places_minimal_confidence`	Minimal confidence level for the places dataset. Defaults to 0.75. TYPE: `float` DEFAULT: `0.75`

RETURNS	DESCRIPTION
`Path`	Path to the generated GeoParquet file. TYPE: `Path`

Source code in overturemaestro/advanced_functions/functions.py

def convert_geometry_to_wide_form_parquet(
    theme: str,
    type: str,
    geometry_filter: BaseGeometry,
    release: Optional[str] = None,
    *,
    include_all_possible_columns: bool = True,
    hierarchy_depth: Optional[int] = None,
    pyarrow_filters: Optional[PYARROW_FILTER] = None,
    compression: str = PARQUET_COMPRESSION,
    compression_level: int = PARQUET_COMPRESSION_LEVEL,
    row_group_size: int = PARQUET_ROW_GROUP_SIZE,
    result_file_path: Optional[Union[str, Path]] = None,
    ignore_cache: bool = False,
    working_directory: Union[str, Path] = "files",
    verbosity_mode: VERBOSITY_MODE = "transient",
    max_workers: Optional[int] = None,
    sort_result: bool = True,
    places_use_primary_category_only: bool = False,
    places_minimal_confidence: float = 0.75,
) -> Path:
    """
    Get GeoParquet file for a given geometry in a wide format.

    Automatically downloads Overture Maps dataset for a given release and theme/type
    in a concurrent manner and returns a single file as a result with multiple columns based
    on dataset schema.

    Args:
        theme (str): Theme of the dataset.
        type (str): Type of the dataset.
        geometry_filter (BaseGeometry): Geometry used to filter data.
        release (Optional[str], optional): Release version. If not provided, will automatically load
            newest available release version. Defaults to None.
        include_all_possible_columns (bool, optional): Whether to have always the same list of
            columns in the resulting file. This ensures that always the same set of columns is
            returned for a given release for different regions. This also means, that some columns
            might be all filled with a False value. Defaults to True.
        hierarchy_depth (Optional[int], optional): Depth used to calculate how many hierarchy
            columns should be used to generate the wide form of the data. If None, will use all
            available columns. Defaults to None.
        pyarrow_filters (Optional[PYARROW_FILTER], optional): A pyarrow expression used to filter
            specific theme type pair. Defaults to None.
        compression (str, optional): Compression of the final parquet file.
            Check https://duckdb.org/docs/sql/statements/copy#parquet-options for more info.
            Remember to change compression level together with this parameter.
            Defaults to "zstd".
        compression_level (int, optional): Compression level of the final parquet file.
            Check https://duckdb.org/docs/sql/statements/copy#parquet-options for more info.
            Defaults to 3.
        row_group_size (int, optional): Approximate number of rows per row group in the final
            parquet file. Defaults to 100_000.
        result_file_path (Union[str, Path], optional): Where to save
            the geoparquet file. If not provided, will be generated based on hashes
            from filters. Defaults to None.
        ignore_cache (bool, optional): Whether to ignore precalculated geoparquet files or not.
            Defaults to False.
        working_directory (Union[str, Path], optional): Directory where to save
            the downloaded `*.parquet` files. Defaults to "files".
        verbosity_mode (Literal["silent", "transient", "verbose"], optional): Set progress
            verbosity mode. Can be one of: silent, transient and verbose. Silent disables
            output completely. Transient tracks progress, but removes output after finished.
            Verbose leaves all progress outputs in the stdout. Defaults to "transient".
        max_workers (Optional[int], optional): Max number of multiprocessing workers used to
            process the dataset. Defaults to None.
        sort_result (bool, optional): Whether to sort the result by geometry or not.
            Defaults to True.
        places_use_primary_category_only (bool, optional): Whether to use only the primary category
            for places. Defaults to False.
        places_minimal_confidence (float, optional): Minimal confidence level for the places
            dataset. Defaults to 0.75.

    Returns:
        Path: Path to the generated GeoParquet file.
    """
    return convert_geometry_to_wide_form_parquet_for_multiple_types(
        theme_type_pairs=[(theme, type)],
        geometry_filter=geometry_filter,
        release=release,
        include_all_possible_columns=include_all_possible_columns,
        hierarchy_depth=hierarchy_depth,
        pyarrow_filters=[pyarrow_filters],
        compression=compression,
        compression_level=compression_level,
        row_group_size=row_group_size,
        result_file_path=result_file_path,
        ignore_cache=ignore_cache,
        working_directory=working_directory,
        verbosity_mode=verbosity_mode,
        max_workers=max_workers,
        sort_result=sort_result,
        places_use_primary_category_only=places_use_primary_category_only,
        places_minimal_confidence=places_minimal_confidence,
    )

convert_geometry_to_wide_form_parquet_for_all_types ¶

convert_geometry_to_wide_form_parquet_for_all_types(
    geometry_filter: BaseGeometry,
    release: Optional[str] = None,
    *,
    include_all_possible_columns: bool = True,
    hierarchy_depth: Optional[Union[int, list[Optional[int]]]] = None,
    pyarrow_filters: Optional[list[Optional[PYARROW_FILTER]]] = None,
    compression: str = PARQUET_COMPRESSION,
    compression_level: int = PARQUET_COMPRESSION_LEVEL,
    row_group_size: int = PARQUET_ROW_GROUP_SIZE,
    result_file_path: Optional[Union[str, Path]] = None,
    ignore_cache: bool = False,
    working_directory: Union[str, Path] = "files",
    verbosity_mode: VERBOSITY_MODE = "transient",
    max_workers: Optional[int] = None,
    sort_result: bool = True,
    places_use_primary_category_only: bool = False,
    places_minimal_confidence: float = 0.75
) -> Path

Get GeoParquet file for a given geometry in a wide format for all types.

Automatically downloads Overture Maps dataset for a given release and all available theme/types in a concurrent manner and returns a single file as a result with multiple columns based on dataset schema.

PARAMETER	DESCRIPTION
`geometry_filter`	Geometry used to filter data. TYPE: `BaseGeometry`
`release`	Release version. If not provided, will automatically load newest available release version. Defaults to None. TYPE: `Optional[str]` DEFAULT: `None`
`include_all_possible_columns`	Whether to have always the same list of columns in the resulting file. This ensures that always the same set of columns is returned for a given release for different regions. This also means, that some columns might be all filled with a False value. Defaults to True. TYPE: `bool` DEFAULT: `True`
`hierarchy_depth`	Depth used to calculate how many hierarchy columns should be used to generate the wide form of the data. Can be a single integer or a list of integers. If None, will use all available columns. Defaults to None. TYPE: `Optional[Union[int, list[Optional[int]]]]` DEFAULT: `None`
`pyarrow_filters`	A list of pyarrow expressions used to filter specific theme type pair. Must be the same length as the list of theme type pairs. Defaults to None. TYPE: `Optional[list[Optional[PYARROW_FILTER]]]` DEFAULT: `None`
`compression`	Compression of the final parquet file. Check https://duckdb.org/docs/sql/statements/copy#parquet-options for more info. Remember to change compression level together with this parameter. Defaults to "zstd". TYPE: `str` DEFAULT: `PARQUET_COMPRESSION`
`compression_level`	Compression level of the final parquet file. Check https://duckdb.org/docs/sql/statements/copy#parquet-options for more info. Defaults to 3. TYPE: `int` DEFAULT: `PARQUET_COMPRESSION_LEVEL`
`row_group_size`	Approximate number of rows per row group in the final parquet file. Defaults to 100_000. TYPE: `int` DEFAULT: `PARQUET_ROW_GROUP_SIZE`
`result_file_path`	Where to save the geoparquet file. If not provided, will be generated based on hashes from filters. Defaults to None. TYPE: `Union[str, Path]` DEFAULT: `None`
`ignore_cache`	Whether to ignore precalculated geoparquet files or not. Defaults to False. TYPE: `bool` DEFAULT: `False`
`working_directory`	Directory where to save the downloaded `.parquet` files. Defaults to "files". TYPE:* `Union[str, Path]` DEFAULT: `'files'`
`verbosity_mode`	Set progress verbosity mode. Can be one of: silent, transient and verbose. Silent disables output completely. Transient tracks progress, but removes output after finished. Verbose leaves all progress outputs in the stdout. Defaults to "transient". TYPE: `Literal[silent, transient, verbose]` DEFAULT: `'transient'`
`max_workers`	Max number of multiprocessing workers used to process the dataset. Defaults to None. TYPE: `Optional[int]` DEFAULT: `None`
`sort_result`	Whether to sort the result by geometry or not. Defaults to True. TYPE: `bool` DEFAULT: `True`
`places_use_primary_category_only`	Whether to use only primary category from the places dataset. Defaults to False. TYPE: `bool` DEFAULT: `False`
`places_minimal_confidence`	Minimal confidence level for the places dataset. Defaults to 0.75. TYPE: `float` DEFAULT: `0.75`

RETURNS	DESCRIPTION
`Path`	Path to the generated GeoParquet file. TYPE: `Path`

Source code in overturemaestro/advanced_functions/wide_form.py

def convert_geometry_to_wide_form_parquet_for_all_types(
    geometry_filter: BaseGeometry,
    release: Optional[str] = None,
    *,
    include_all_possible_columns: bool = True,
    hierarchy_depth: Optional[Union[int, list[Optional[int]]]] = None,
    pyarrow_filters: Optional[list[Optional[PYARROW_FILTER]]] = None,
    compression: str = PARQUET_COMPRESSION,
    compression_level: int = PARQUET_COMPRESSION_LEVEL,
    row_group_size: int = PARQUET_ROW_GROUP_SIZE,
    result_file_path: Optional[Union[str, Path]] = None,
    ignore_cache: bool = False,
    working_directory: Union[str, Path] = "files",
    verbosity_mode: VERBOSITY_MODE = "transient",
    max_workers: Optional[int] = None,
    sort_result: bool = True,
    places_use_primary_category_only: bool = False,
    places_minimal_confidence: float = 0.75,
) -> Path:
    """
    Get GeoParquet file for a given geometry in a wide format for all types.

    Automatically downloads Overture Maps dataset for a given release and all available theme/types
    in a concurrent manner and returns a single file as a result with multiple columns based
    on dataset schema.

    Args:
        geometry_filter (BaseGeometry): Geometry used to filter data.
        release (Optional[str], optional): Release version. If not provided, will automatically load
            newest available release version. Defaults to None.
        include_all_possible_columns (bool, optional): Whether to have always the same list of
            columns in the resulting file. This ensures that always the same set of columns is
            returned for a given release for different regions. This also means, that some columns
            might be all filled with a False value. Defaults to True.
        hierarchy_depth (Optional[Union[int, list[Optional[int]]]], optional): Depth used to
            calculate how many hierarchy columns should be used to generate the wide form of
            the data. Can be a single integer or a list of integers. If None, will use all
            available columns. Defaults to None.
        pyarrow_filters (Optional[list[Optional[PYARROW_FILTER]]], optional): A list of pyarrow
            expressions used to filter specific theme type pair. Must be the same length as the list
            of theme type pairs. Defaults to None.
        compression (str, optional): Compression of the final parquet file.
            Check https://duckdb.org/docs/sql/statements/copy#parquet-options for more info.
            Remember to change compression level together with this parameter.
            Defaults to "zstd".
        compression_level (int, optional): Compression level of the final parquet file.
            Check https://duckdb.org/docs/sql/statements/copy#parquet-options for more info.
            Defaults to 3.
        row_group_size (int, optional): Approximate number of rows per row group in the final
            parquet file. Defaults to 100_000.
        result_file_path (Union[str, Path], optional): Where to save
            the geoparquet file. If not provided, will be generated based on hashes
            from filters. Defaults to None.
        ignore_cache (bool, optional): Whether to ignore precalculated geoparquet files or not.
            Defaults to False.
        working_directory (Union[str, Path], optional): Directory where to save
            the downloaded `*.parquet` files. Defaults to "files".
        verbosity_mode (Literal["silent", "transient", "verbose"], optional): Set progress
            verbosity mode. Can be one of: silent, transient and verbose. Silent disables
            output completely. Transient tracks progress, but removes output after finished.
            Verbose leaves all progress outputs in the stdout. Defaults to "transient".
        max_workers (Optional[int], optional): Max number of multiprocessing workers used to
            process the dataset. Defaults to None.
        sort_result (bool, optional): Whether to sort the result by geometry or not.
            Defaults to True.
        places_use_primary_category_only (bool, optional): Whether to use only primary category
            from the places dataset. Defaults to False.
        places_minimal_confidence (float, optional): Minimal confidence level for the places
            dataset. Defaults to 0.75.

    Returns:
        Path: Path to the generated GeoParquet file.
    """
    if not release:
        release = get_newest_release_version()

    return convert_geometry_to_wide_form_parquet_for_multiple_types(
        theme_type_pairs=list(get_theme_type_classification(release=release).keys()),
        geometry_filter=geometry_filter,
        release=release,
        include_all_possible_columns=include_all_possible_columns,
        hierarchy_depth=hierarchy_depth,
        pyarrow_filters=pyarrow_filters,
        compression=compression,
        compression_level=compression_level,
        row_group_size=row_group_size,
        result_file_path=result_file_path,
        ignore_cache=ignore_cache,
        working_directory=working_directory,
        verbosity_mode=verbosity_mode,
        max_workers=max_workers,
        sort_result=sort_result,
        places_use_primary_category_only=places_use_primary_category_only,
        places_minimal_confidence=places_minimal_confidence,
    )

convert_geometry_to_wide_form_parquet_for_multiple_types ¶

convert_geometry_to_wide_form_parquet_for_multiple_types(
    theme_type_pairs: list[tuple[str, str]],
    geometry_filter: BaseGeometry,
    release: Optional[str] = None,
    *,
    include_all_possible_columns: bool = True,
    hierarchy_depth: Optional[Union[int, list[Optional[int]]]] = None,
    pyarrow_filters: Optional[list[Optional[PYARROW_FILTER]]] = None,
    compression: str = PARQUET_COMPRESSION,
    compression_level: int = PARQUET_COMPRESSION_LEVEL,
    row_group_size: int = PARQUET_ROW_GROUP_SIZE,
    result_file_path: Optional[Union[str, Path]] = None,
    ignore_cache: bool = False,
    working_directory: Union[str, Path] = "files",
    verbosity_mode: VERBOSITY_MODE = "transient",
    max_workers: Optional[int] = None,
    sort_result: bool = True,
    places_use_primary_category_only: bool = False,
    places_minimal_confidence: float = 0.75
) -> Path

Get GeoParquet file for a given geometry in a wide format for multiple types.

Automatically downloads Overture Maps dataset for a given release and theme/type pairs in a concurrent manner and returns a single file as a result with multiple columns based on dataset schema.

PARAMETER	DESCRIPTION
`theme_type_pairs`	Pairs of themes and types of the dataset. TYPE: `list[tuple[str, str]]`
`geometry_filter`	Geometry used to filter data. TYPE: `BaseGeometry`
`release`	Release version. If not provided, will automatically load newest available release version. Defaults to None. TYPE: `Optional[str]` DEFAULT: `None`
`include_all_possible_columns`	Whether to have always the same list of columns in the resulting file. This ensures that always the same set of columns is returned for a given release for different regions. This also means, that some columns might be all filled with a False value. Defaults to True. TYPE: `bool` DEFAULT: `True`
`hierarchy_depth`	Depth used to calculate how many hierarchy columns should be used to generate the wide form of the data. Can be a single integer or a list of integers. If None, will use all available columns. Defaults to None. TYPE: `Optional[Union[int, list[Optional[int]]]]` DEFAULT: `None`
`pyarrow_filters`	A list of pyarrow expressions used to filter specific theme type pair. Must be the same length as the list of theme type pairs. Defaults to None. TYPE: `Optional[list[Optional[PYARROW_FILTER]]]` DEFAULT: `None`
`compression`	Compression of the final parquet file. Check https://duckdb.org/docs/sql/statements/copy#parquet-options for more info. Remember to change compression level together with this parameter. Defaults to "zstd". TYPE: `str` DEFAULT: `PARQUET_COMPRESSION`
`compression_level`	Compression level of the final parquet file. Check https://duckdb.org/docs/sql/statements/copy#parquet-options for more info. Defaults to 3. TYPE: `int` DEFAULT: `PARQUET_COMPRESSION_LEVEL`
`row_group_size`	Approximate number of rows per row group in the final parquet file. Defaults to 100_000. TYPE: `int` DEFAULT: `PARQUET_ROW_GROUP_SIZE`
`result_file_path`	Where to save the geoparquet file. If not provided, will be generated based on hashes from filters. Defaults to None. TYPE: `Union[str, Path]` DEFAULT: `None`
`ignore_cache`	Whether to ignore precalculated geoparquet files or not. Defaults to False. TYPE: `bool` DEFAULT: `False`
`working_directory`	Directory where to save the downloaded `.parquet` files. Defaults to "files". TYPE:* `Union[str, Path]` DEFAULT: `'files'`
`verbosity_mode`	Set progress verbosity mode. Can be one of: silent, transient and verbose. Silent disables output completely. Transient tracks progress, but removes output after finished. Verbose leaves all progress outputs in the stdout. Defaults to "transient". TYPE: `Literal[silent, transient, verbose]` DEFAULT: `'transient'`
`max_workers`	Max number of multiprocessing workers used to process the dataset. Defaults to None. TYPE: `Optional[int]` DEFAULT: `None`
`sort_result`	Whether to sort the result by geometry or not. Defaults to True. TYPE: `bool` DEFAULT: `True`
`places_use_primary_category_only`	Whether to use only primary category from the places dataset. Defaults to False. TYPE: `bool` DEFAULT: `False`
`places_minimal_confidence`	Minimal confidence level for the places dataset. Defaults to 0.75. TYPE: `float` DEFAULT: `0.75`

RETURNS	DESCRIPTION
`Path`	Path to the generated GeoParquet file. TYPE: `Path`

Source code in overturemaestro/advanced_functions/wide_form.py

@show_total_elapsed_time_decorator
def convert_geometry_to_wide_form_parquet_for_multiple_types(
    theme_type_pairs: list[tuple[str, str]],
    geometry_filter: BaseGeometry,
    release: Optional[str] = None,
    *,
    include_all_possible_columns: bool = True,
    hierarchy_depth: Optional[Union[int, list[Optional[int]]]] = None,
    pyarrow_filters: Optional[list[Optional[PYARROW_FILTER]]] = None,
    compression: str = PARQUET_COMPRESSION,
    compression_level: int = PARQUET_COMPRESSION_LEVEL,
    row_group_size: int = PARQUET_ROW_GROUP_SIZE,
    result_file_path: Optional[Union[str, Path]] = None,
    ignore_cache: bool = False,
    working_directory: Union[str, Path] = "files",
    verbosity_mode: VERBOSITY_MODE = "transient",
    max_workers: Optional[int] = None,
    sort_result: bool = True,
    places_use_primary_category_only: bool = False,
    places_minimal_confidence: float = 0.75,
) -> Path:
    """
    Get GeoParquet file for a given geometry in a wide format for multiple types.

    Automatically downloads Overture Maps dataset for a given release and theme/type pairs
    in a concurrent manner and returns a single file as a result with multiple columns based
    on dataset schema.

    Args:
        theme_type_pairs (list[tuple[str, str]]): Pairs of themes and types of the dataset.
        geometry_filter (BaseGeometry): Geometry used to filter data.
        release (Optional[str], optional): Release version. If not provided, will automatically load
            newest available release version. Defaults to None.
        include_all_possible_columns (bool, optional): Whether to have always the same list of
            columns in the resulting file. This ensures that always the same set of columns is
            returned for a given release for different regions. This also means, that some columns
            might be all filled with a False value. Defaults to True.
        hierarchy_depth (Optional[Union[int, list[Optional[int]]]], optional): Depth used to
            calculate how many hierarchy columns should be used to generate the wide form of
            the data. Can be a single integer or a list of integers. If None, will use all
            available columns. Defaults to None.
        pyarrow_filters (Optional[list[Optional[PYARROW_FILTER]]], optional): A list of pyarrow
            expressions used to filter specific theme type pair. Must be the same length as the list
            of theme type pairs. Defaults to None.
        compression (str, optional): Compression of the final parquet file.
            Check https://duckdb.org/docs/sql/statements/copy#parquet-options for more info.
            Remember to change compression level together with this parameter.
            Defaults to "zstd".
        compression_level (int, optional): Compression level of the final parquet file.
            Check https://duckdb.org/docs/sql/statements/copy#parquet-options for more info.
            Defaults to 3.
        row_group_size (int, optional): Approximate number of rows per row group in the final
            parquet file. Defaults to 100_000.
        result_file_path (Union[str, Path], optional): Where to save
            the geoparquet file. If not provided, will be generated based on hashes
            from filters. Defaults to None.
        ignore_cache (bool, optional): Whether to ignore precalculated geoparquet files or not.
            Defaults to False.
        working_directory (Union[str, Path], optional): Directory where to save
            the downloaded `*.parquet` files. Defaults to "files".
        verbosity_mode (Literal["silent", "transient", "verbose"], optional): Set progress
            verbosity mode. Can be one of: silent, transient and verbose. Silent disables
            output completely. Transient tracks progress, but removes output after finished.
            Verbose leaves all progress outputs in the stdout. Defaults to "transient".
        max_workers (Optional[int], optional): Max number of multiprocessing workers used to
            process the dataset. Defaults to None.
        sort_result (bool, optional): Whether to sort the result by geometry or not.
            Defaults to True.
        places_use_primary_category_only (bool, optional): Whether to use only primary category
            from the places dataset. Defaults to False.
        places_minimal_confidence (float, optional): Minimal confidence level for the places
            dataset. Defaults to 0.75.

    Returns:
        Path: Path to the generated GeoParquet file.
    """
    if pyarrow_filters is not None and len(theme_type_pairs) != len(pyarrow_filters):
        raise ValueError("Pyarrow filters length doesn't match length of theme type pairs.")

    if isinstance(hierarchy_depth, list) and len(theme_type_pairs) != len(hierarchy_depth):
        raise ValueError("Hierarchy depth list length doesn't match length of theme type pairs.")

    if not release:
        release = get_newest_release_version()

    pyarrow_filters_list = []
    for idx in range(len(theme_type_pairs)):
        _pyarrow_filter = pyarrow_filters[idx] if pyarrow_filters else None

        if _pyarrow_filter is not None:
            from pyarrow.parquet import filters_to_expression

            _pyarrow_filter = filters_to_expression(_pyarrow_filter)

        pyarrow_filters_list.append(_pyarrow_filter)

    if result_file_path is None:
        result_file_path = working_directory / _generate_result_file_path(
            release=release,
            theme_type_pairs=theme_type_pairs,
            geometry_filter=geometry_filter,
            include_all_possible_columns=include_all_possible_columns,
            hierarchy_depth=hierarchy_depth,
            pyarrow_filters=pyarrow_filters_list,
            sort_result=sort_result,
            places_use_primary_category_only=places_use_primary_category_only,
            places_minimal_confidence=places_minimal_confidence,
        )

    result_file_path = Path(result_file_path)

    if not result_file_path.exists() or ignore_cache:
        result_file_path.parent.mkdir(exist_ok=True, parents=True)

        prepared_download_parameters = _prepare_download_parameters_for_all_theme_type_pairs(
            release=release,
            theme_type_pairs=theme_type_pairs,
            geometry_filter=geometry_filter,
            hierarchy_depth=hierarchy_depth,
            pyarrow_filters=pyarrow_filters_list,
            verbosity_mode=verbosity_mode,
            places_minimal_confidence=places_minimal_confidence,
        )

        hierachy_columns_list, columns_to_download_list, pyarrow_filter_list = zip(
            *prepared_download_parameters
        )

        downloaded_parquet_files = download_data_for_multiple_types(
            release=release,
            theme_type_pairs=theme_type_pairs,
            geometry_filter=geometry_filter,
            pyarrow_filters=pyarrow_filter_list,
            columns_to_download=[
                [INDEX_COLUMN, GEOMETRY_COLUMN, *columns_to_download]
                for columns_to_download in columns_to_download_list
            ],
            compression=compression,
            compression_level=compression_level,
            row_group_size=row_group_size,
            ignore_cache=ignore_cache,
            working_directory=working_directory,
            verbosity_mode=verbosity_mode,
            max_workers=max_workers,
            sort_result=False,
        )

        with tempfile.TemporaryDirectory(dir=Path(working_directory).resolve()) as tmp_dir_name:
            tmp_dir_path = Path(tmp_dir_name)

            merged_parquet_path = (
                tmp_dir_path / f"{result_file_path.stem}_merged{result_file_path.suffix}"
            )

            transformed_wide_form_directory_output = tmp_dir_path / "wide_form_files"
            transformed_wide_form_directory_output.mkdir(parents=True, exist_ok=True)

            with TrackProgressBar(verbosity_mode=verbosity_mode) as progress:
                for (
                    (theme_value, type_value),
                    hierachy_columns,
                    downloaded_parquet_file,
                ) in progress.track(
                    zip(theme_type_pairs, hierachy_columns_list, downloaded_parquet_files),
                    total=len(theme_type_pairs),
                    description="Transforming data into wide form",
                ):
                    wide_form_definition = get_theme_type_classification(release=release)[
                        (theme_value, type_value)
                    ]

                    output_path = (
                        transformed_wide_form_directory_output
                        / f"{theme_value}_{type_value}.parquet"
                    )
                    if len(theme_type_pairs) == 1:
                        output_path = merged_parquet_path

                    if not hierachy_columns:
                        _transform_to_wide_form_without_hierarchy(
                            theme=theme_value,
                            type=type_value,
                            parquet_file=downloaded_parquet_file,
                            output_path=output_path,
                            compression=compression,
                            compression_level=compression_level,
                            row_group_size=row_group_size,
                            working_directory=tmp_dir_path,
                        )
                    else:
                        wide_form_definition.data_transform_function(
                            theme=theme_value,
                            type=type_value,
                            release_version=release,
                            parquet_file=downloaded_parquet_file,
                            output_path=output_path,
                            compression=compression,
                            compression_level=compression_level,
                            row_group_size=row_group_size,
                            include_all_possible_columns=include_all_possible_columns,
                            hierarchy_columns=hierachy_columns,
                            working_directory=tmp_dir_path,
                            verbosity_mode=verbosity_mode,
                            places_use_primary_category_only=places_use_primary_category_only,
                        )

            if len(theme_type_pairs) > 1:
                with TrackProgressSpinner(
                    "Joining results to a single file", verbosity_mode=verbosity_mode
                ):
                    _combine_multiple_wide_form_files(
                        theme_type_pairs=theme_type_pairs,
                        transformed_wide_form_directory=transformed_wide_form_directory_output,
                        output_path=merged_parquet_path,
                        compression=compression,
                        compression_level=compression_level,
                        row_group_size=row_group_size,
                        working_directory=tmp_dir_path,
                    )

            if sort_result:
                with TrackProgressBar(verbosity_mode=verbosity_mode) as progress_bar:
                    total_rows = pq.read_metadata(merged_parquet_path).num_rows
                    task = progress_bar.add_task(
                        description="Sorting result file by geometry", total=total_rows
                    )

                    def progress_callback(processed: int) -> None:
                        progress_bar.update(task, completed=processed, refresh=True)

                    columns = pq.read_schema(merged_parquet_path).names
                    value_columns = [
                        col for col in columns if col not in (INDEX_COLUMN, GEOMETRY_COLUMN)
                    ]

                    compressed_parquet_path = tmp_dir_path / "_compressed.parquet"
                    _compress_value_columns(
                        input_file=merged_parquet_path,
                        output_file=compressed_parquet_path,
                        value_columns=value_columns,
                        working_directory=tmp_dir_path,
                    )

                    merged_parquet_path.unlink(missing_ok=True)

                    sorted_parquet_path = tmp_dir_path / "_sorted.parquet"
                    sort_geoparquet_file_by_geometry(
                        input_file_path=compressed_parquet_path,
                        output_file_path=sorted_parquet_path,
                        compression="zstd",
                        compression_level=3,
                        row_group_size=row_group_size,
                        working_directory=working_directory,
                        sort_extent=geometry_filter.bounds,
                        verbosity_mode=verbosity_mode,
                        progress_callback=progress_callback,
                    )

                    compressed_parquet_path.unlink(missing_ok=True)

                    _decompress_value_columns(
                        input_file=sorted_parquet_path,
                        output_file=result_file_path,
                        value_columns=value_columns,
                        working_directory=tmp_dir_path,
                        compression=compression,
                        compression_level=compression_level,
                        row_group_size=row_group_size,
                        verbosity_mode=verbosity_mode,
                    )

                    progress_callback(total_rows)

                    sorted_parquet_path.unlink(missing_ok=True)
            else:
                merged_parquet_path.rename(result_file_path)

    return result_file_path