Index
loaders ¶
This module contains loaders, used to load spatial data from different sources.
We want to unify loading from different data sources into a single interface. Thanks to this, we have a unified spatial data format, which makes it possible to feed them into any of the embedding methods available in this library.
GTFSLoader ¶
Bases: Loader
GTFSLoader.
This loader is capable of reading GTFS feed and calculates time aggregations in 1H slots.
Source code in srai/loaders/gtfs_loader.py
load ¶
load(
gtfs_file: Path,
fail_on_validation_errors: bool = True,
skip_validation: bool = False,
) -> gpd.GeoDataFrame
Load GTFS feed and calculate time aggregations for stops.
PARAMETER | DESCRIPTION |
---|---|
gtfs_file
|
Path to the GTFS feed.
TYPE:
|
fail_on_validation_errors
|
Fail if GTFS feed is invalid. Ignored when skip_validation is True.
TYPE:
|
skip_validation
|
Skip GTFS feed validation.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
GeoDataFrame
|
gpd.GeoDataFrame: GeoDataFrame with trip counts and list of directions for stops. |
Source code in srai/loaders/gtfs_loader.py
GeoparquetLoader ¶
Bases: Loader
GeoparquetLoader.
Geoparquet [1] loader is a wrapper for a geopandas.read_parquet
function
and allows for an automatic index setting and additional geometry clipping.
References
load ¶
load(
file_path: Union[Path, str],
index_column: Optional[str] = None,
columns: Optional[list[str]] = None,
area: Optional[gpd.GeoDataFrame] = None,
) -> gpd.GeoDataFrame
Load a geoparquet file.
PARAMETER | DESCRIPTION |
---|---|
file_path
|
parquet file path.
TYPE:
|
index_column
|
Column that will be used as an index. If not provided, automatic indexing will be applied by default. Defaults to None.
TYPE:
|
columns
|
List of columns to load. If not provided, all will be loaded. Defaults to None.
TYPE:
|
area
|
Mask to clip loaded data. If not provided, unaltered data will be returned. Defaults to None.
TYPE:
|
RAISES | DESCRIPTION |
---|---|
ValueError
|
If provided index column doesn't exists in list of loaded columns. |
RETURNS | DESCRIPTION |
---|---|
GeoDataFrame
|
gpd.GeoDataFrame: Loaded geoparquet file as a GeoDataFrame. |
Source code in srai/loaders/geoparquet_loader.py
Loader ¶
Bases: ABC
Abstract class for loaders.
load ¶
abstractmethod
Load data for a given area.
PARAMETER | DESCRIPTION |
---|---|
*args
|
Positional arguments dependating on a specific loader.
TYPE:
|
**kwargs
|
Keyword arguments dependating on a specific loader.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
GeoDataFrame
|
GeoDataFrame with the downloaded data. |
Source code in srai/loaders/_base.py
OSMLoader ¶
Bases: Loader
, ABC
Abstract class for loaders.
load ¶
abstractmethod
load(
area: Union[
BaseGeometry, Iterable[BaseGeometry], gpd.GeoSeries, gpd.GeoDataFrame
],
tags: Union[OsmTagsFilter, GroupedOsmTagsFilter],
) -> gpd.GeoDataFrame
Load data for a given area.
PARAMETER | DESCRIPTION |
---|---|
area
|
Shapely geometry with the area of interest.
TYPE:
|
tags
|
OSM tags filter.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
GeoDataFrame
|
gpd.GeoDataFrame: GeoDataFrame with the downloaded data. |
Source code in srai/loaders/osm_loaders/_base.py
OSMNetworkType ¶
Bases: str
, Enum
Type of the street network.
See [1] for more details.
OSMOnlineLoader ¶
Bases: OSMLoader
OSMOnlineLoader.
OSM(OpenStreetMap)[1] online loader is a loader capable of downloading objects from a given area from OSM. It filters features based on OSM tags[2] in form of key:value pairs, that are used by OSM users to give meaning to geometries.
This loader is a wrapper around the osmnx
library. It uses osmnx.geometries_from_polygon
to make individual queries.
Source code in srai/loaders/osm_loaders/osm_online_loader.py
load ¶
load(
area: Union[
BaseGeometry, Iterable[BaseGeometry], gpd.GeoSeries, gpd.GeoDataFrame
],
tags: Union[OsmTagsFilter, GroupedOsmTagsFilter],
) -> gpd.GeoDataFrame
Download OSM features with specified tags for a given area.
The loader first downloads all objects with tags
. It returns a GeoDataFrame containing
the geometry
column and columns for tag keys.
Some key/value pairs might be missing from the resulting GeoDataFrame,
simply because there are no such objects in the given area.
PARAMETER | DESCRIPTION |
---|---|
area
|
Area for which to download objects.
TYPE:
|
tags
|
A dictionary
specifying which tags to download.
The keys should be OSM tags (e.g.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
GeoDataFrame
|
gpd.GeoDataFrame: Downloaded features as a GeoDataFrame. |
Source code in srai/loaders/osm_loaders/osm_online_loader.py
OSMPbfLoader ¶
OSMPbfLoader(
pbf_file: Optional[Union[str, Path]] = None,
download_source: OsmExtractSource = "any",
download_directory: Union[str, Path] = "files",
verbosity_mode: Literal["silent", "transient", "verbose"] = "transient",
)
Bases: OSMLoader
OSMPbfLoader.
OSM(OpenStreetMap)[1] PBF(Protocolbuffer Binary Format)[2] loader is a loader capable of loading OSM features from a PBF file. It filters features based on OSM tags[3] in form of key:value pairs, that are used by OSM users to give meaning to geometries.
This loader uses PbfFileReader
from the QuackOSM
[3] library.
It utilizes the duckdb
[4] engine with spatial
[5] extension
capable of parsing an *.osm.pbf
file.
Additionally, it can download a pbf file extract for a given area using different sources.
References
PARAMETER | DESCRIPTION |
---|---|
pbf_file
|
Downloaded
TYPE:
|
download_source
|
Source to use when downloading PBF files.
Can be one of:
TYPE:
|
download_directory
|
Directory where to save the downloaded
TYPE:
|
verbosity_mode
|
Set progress verbosity mode. Can be one of: silent, transient and verbose. Silent disables output completely. Transient tracks progress, but removes output after finished. Verbose leaves all progress outputs in the stdout. Defaults to "transient".
TYPE:
|
Source code in srai/loaders/osm_loaders/osm_pbf_loader.py
load ¶
load(
area: Union[
BaseGeometry, Iterable[BaseGeometry], gpd.GeoSeries, gpd.GeoDataFrame
],
tags: Union[OsmTagsFilter, GroupedOsmTagsFilter],
ignore_cache: bool = False,
explode_tags: bool = True,
keep_all_tags: bool = False,
) -> gpd.GeoDataFrame
Load OSM features with specified tags for a given area from an *.osm.pbf
file.
The loader will use provided *.osm.pbf
file, or download extracts
automatically. Later it will parse and filter features from files
using PbfFileReader
from QuackOSM
library. It will return a GeoDataFrame
containing the geometry
column and columns for tag keys.
Some key/value pairs might be missing from the resulting GeoDataFrame,
simply because there are no such objects in the given area.
PARAMETER | DESCRIPTION |
---|---|
area
|
Area for which to download objects.
TYPE:
|
tags
|
A dictionary
specifying which tags to download.
The keys should be OSM tags (e.g.
TYPE:
|
ignore_cache
|
(bool, optional): Whether to ignore precalculated geoparquet files or not. Defaults to False.
TYPE:
|
explode_tags
|
(bool, optional): Whether to split OSM tags into multiple columns or keep them in a single dict. Defaults to True.
TYPE:
|
keep_all_tags
|
(bool, optional): Whether to keep all tags related to the element,
or return only those defined in the
TYPE:
|
RAISES | DESCRIPTION |
---|---|
ValueError
|
If PBF file is expected to be downloaded and provided geometries aren't shapely.geometry.Polygons. |
RETURNS | DESCRIPTION |
---|---|
GeoDataFrame
|
gpd.GeoDataFrame: Downloaded features as a GeoDataFrame. |
Source code in srai/loaders/osm_loaders/osm_pbf_loader.py
load_to_geoparquet ¶
load_to_geoparquet(
area: Union[
BaseGeometry, Iterable[BaseGeometry], gpd.GeoSeries, gpd.GeoDataFrame
],
tags: Union[OsmTagsFilter, GroupedOsmTagsFilter],
ignore_cache: bool = False,
explode_tags: bool = True,
keep_all_tags: bool = False,
) -> Path
Load OSM features with specified tags for a given area and save it to geoparquet file.
PARAMETER | DESCRIPTION |
---|---|
area
|
Area for which to download objects.
TYPE:
|
tags
|
A dictionary
specifying which tags to download.
The keys should be OSM tags (e.g.
TYPE:
|
ignore_cache
|
(bool, optional): Whether to ignore precalculated geoparquet files or not. Defaults to False.
TYPE:
|
explode_tags
|
(bool, optional): Whether to split OSM tags into multiple columns or keep them in a single dict. Defaults to True.
TYPE:
|
keep_all_tags
|
(bool, optional): Whether to keep all tags related to the element,
or return only those defined in the
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Path
|
Path to the saved GeoParquet file.
TYPE:
|
Source code in srai/loaders/osm_loaders/osm_pbf_loader.py
OSMTileLoader ¶
OSMTileLoader(
tile_server_url: str,
zoom: int,
verbose: bool = False,
resource_type: str = "png",
auth_token: Optional[str] = None,
data_collector: Optional[Union[str, DataCollector]] = None,
storage_path: Optional[Union[str, Path]] = None,
)
OSM Tile Loader.
Download raster tiles from user specified tile server, like listed in [1]. Loader finds x, y coordinates [2] for specified area and downloads tiles. Address is built with schema {tile_server_url}/{zoom}/{x}/{y}.{resource_type}
References
PARAMETER | DESCRIPTION |
---|---|
tile_server_url
|
url of tile server, without z, x, y parameters
TYPE:
|
zoom
|
zoom level [1]
TYPE:
|
verbose
|
should print logs. Defaults to False.
TYPE:
|
resource_type
|
file extension. Added to the end of url. Defaults to "png".
TYPE:
|
auth_token
|
auth token. Added as access_token parameter to request. Defaults to None.
TYPE:
|
data_collector
|
DataCollector object or
TYPE:
|
storage_path
|
path to save data, used with SavingDataCollector. Defaults to None.
TYPE:
|
Source code in srai/loaders/osm_loaders/osm_tile_loader.py
get_tile_by_x_y ¶
Download single tile from tile server. Return tile processed by DataCollector.
PARAMETER | DESCRIPTION |
---|---|
x(int)
|
x tile coordinate
|
y(int)
|
y tile coordinate
|
idx
|
id of tile, if non created as x_y_self.zoom
TYPE:
|
Source code in srai/loaders/osm_loaders/osm_tile_loader.py
load ¶
load(
area: Union[
BaseGeometry, Iterable[BaseGeometry], gpd.GeoSeries, gpd.GeoDataFrame
],
) -> gpd.GeoDataFrame
Return all tiles of region.
PARAMETER | DESCRIPTION |
---|---|
area
|
Area for which to download objects.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
GeoDataFrame
|
gpd.GeoDataFrame: Pandas of tiles for each region in area transformed by DataCollector |
Source code in srai/loaders/osm_loaders/osm_tile_loader.py
OSMWayLoader ¶
OSMWayLoader(
network_type: Union[OSMNetworkType, str],
contain_within_area: bool = False,
preprocess: bool = True,
wide: bool = True,
metadata: bool = False,
osm_way_tags: dict[str, list[str]] = constants.OSM_WAY_TAGS,
)
Bases: Loader
OSMWayLoader downloads road infrastructure from OSM.
OSMWayLoader loader is a wrapper for the osmnx.graph_from_polygon()
and osmnx.graph_to_gdfs()
that simplifies obtaining the road infrastructure data
from OpenStreetMap. As the OSM data is often noisy, it can also take an opinionated approach
to preprocessing it, with standardisation in mind - e.g. unification of units,
discarding non-wiki values and rounding them.
PARAMETER | DESCRIPTION |
---|---|
network_type
|
Type of the network to download.
TYPE:
|
contain_within_area
|
defaults to False Whether to remove the roads that have one of their nodes outside of the given area.
TYPE:
|
preprocess
|
defaults to True Whether to preprocess the data.
TYPE:
|
wide
|
defaults to True Whether to return the roads in wide format.
TYPE:
|
metadata
|
defaults to False Whether to return metadata for roads.
TYPE:
|
osm_way_tags
|
defaults to constants.OSM_WAY_TAGS Dict of tags to take into consideration during computing.
TYPE:
|
Source code in srai/loaders/osm_way_loader/osm_way_loader.py
load ¶
Load road infrastructure for a given GeoDataFrame.
PARAMETER | DESCRIPTION |
---|---|
area
|
(Multi)Polygons for which to download road infrastructure data.
TYPE:
|
RAISES | DESCRIPTION |
---|---|
ValueError
|
If provided GeoDataFrame has no crs defined. |
ValueError
|
If provided GeoDataFrame is empty. |
TypeError
|
If provided geometries are not of type Polygon or MultiPolygon. |
LoadedDataIsEmptyException
|
If none of the supplied area polygons contains any road infrastructure data. |
RETURNS | DESCRIPTION |
---|---|
tuple[GeoDataFrame, GeoDataFrame]
|
Tuple[gpd.GeoDataFrame, gpd.GeoDataFrame]: Road infrastructure as (intersections, roads) |
Source code in srai/loaders/osm_way_loader/osm_way_loader.py
OvertureMapsLoader ¶
OvertureMapsLoader(
theme_type_pairs: Optional[list[tuple[str, str]]] = None,
release: Optional[str] = None,
include_all_possible_columns: bool = True,
hierarchy_depth: Optional[Union[int, list[Optional[int]]]] = 1,
download_directory: Union[str, Path] = "files",
verbosity_mode: Literal["silent", "transient", "verbose"] = "transient",
max_workers: Optional[int] = None,
places_use_primary_category_only: bool = False,
places_minimal_confidence: float = 0.75,
)
Bases: Loader
OvertureMapsLoader.
Overture Maps[1] loader is a loader capable of loading OvertureMaps features from dedicated s3 bucket. It can download multiple data types for different release versions and it can filter features using PyArrow[2] filters.
This loader is a wrapper around OvertureMaestro
[3] library.
It utilizes the PyArrow streaming capabilities as well as duckdb
[4] engine for transforming
the data into the required format.
References
PARAMETER | DESCRIPTION |
---|---|
theme_type_pairs
|
List of theme type pairs to download. If None, will download all available datasets. Defaults to None.
TYPE:
|
release
|
Release version. If not provided, will automatically load newest available release version. Defaults to None.
TYPE:
|
include_all_possible_columns
|
Whether to have always the same list of columns in the resulting file. This ensures that always the same set of columns is returned for a given release for different regions. This also means, that some columns might be all filled with a False value. Defaults to True.
TYPE:
|
hierarchy_depth
|
Depth used to calculate how many hierarchy columns should be used to generate the wide form of the data. Can be a single integer or a list of integers. If None, will use all available columns. Must be a non-negative integer. Defaults to 1.
TYPE:
|
download_directory
|
Directory where to save the downloaded GeoParquet files. Defaults to "files".
TYPE:
|
verbosity_mode
|
Set progress verbosity mode. Can be one of: silent, transient and verbose. Silent disables output completely. Transient tracks progress, but removes output after finished. Verbose leaves all progress outputs in the stdout. Defaults to "transient".
TYPE:
|
max_workers
|
Max number of multiprocessing workers used to process the dataset. Defaults to None.
TYPE:
|
places_use_primary_category_only
|
Whether to use only the primary category for places. Defaults to False.
TYPE:
|
places_minimal_confidence
|
Minimal confidence level for the places dataset. Defaults to 0.75.
TYPE:
|
Source code in srai/loaders/overturemaps_loader.py
load ¶
load(
area: Union[
BaseGeometry, Iterable[BaseGeometry], gpd.GeoSeries, gpd.GeoDataFrame
],
ignore_cache: bool = False,
) -> gpd.GeoDataFrame
Load Overture Maps features for a given area in a wide format.
The loader will automatically download matching GeoParquet files from
the S3 bucket provided by the Overture Maps Foundation. Later it will filter
features and transform them into a wide format. It will return a GeoDataFrame
containing the geometry
column and boolean columns for each category.
Note: Remember to set count_categories
to False
in CountEmbedder
and its descendants.
If used with include_all_possible_columns
=False
, some key/value pairs might be
missing from the resulting GeoDataFrame, simply because there are no such objects in the given area.
PARAMETER | DESCRIPTION |
---|---|
area
|
Area for which to download objects.
TYPE:
|
ignore_cache
|
(bool, optional): Whether to ignore precalculated geoparquet files or not. Defaults to False.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
GeoDataFrame
|
gpd.GeoDataFrame: Downloaded features as a GeoDataFrame. |
Source code in srai/loaders/overturemaps_loader.py
download_file ¶
Download a file with progress bar.
PARAMETER | DESCRIPTION |
---|---|
url
|
URL to download.
TYPE:
|
fname
|
File name.
TYPE:
|
chunk_size
|
Chunk size.
TYPE:
|
force_download
|
Flag to force download even if file exists.
TYPE:
|
Source: https://gist.github.com/yanqd0/c13ed29e29432e3cf3e7c38467f42f51