Custom SQL filter¶
QuackOSM enables advanced users to filter data using SQL filters that will be used by DuckDB during processing.
The filter will be loaded alongside with OSM tags filters and features IDs filters.
SQL filter clause will can be passed both in Python API (as custom_sql_filter
parameter) and the CLI (as --custom-sql-filter
option).
Two columns available to users are: id
(type BIGINT
) and tags
(type: MAP(VARCHAR, VARCHAR)
).
You can look for available functions into a DuckDB documentation.
Below are few examples on how to use the custom SQL filters.
import quackosm as qosm
data = qosm.convert_geometry_to_geodataframe(
geometry_filter=qosm.geocode_to_geometry("Greater London"),
osm_extract_source="Geofabrik",
custom_sql_filter="cardinality(tags) = 10",
)
data["tags"].head(10).values
Finished operation in 0:00:30
array([{'check_date': '2024-03-25', 'highway': 'trunk', 'lanes': '2', 'lit': 'yes', 'maxspeed': '40 mph', 'name': 'Kingston Road', 'operator': 'Transport for London', 'ref': 'A243', 'source:name': 'knowledge', 'surface': 'asphalt'}, {'bus': 'yes', 'highway': 'bus_stop', 'name': 'Malden Rushett / Shy Horse', 'naptan:AtcoCode': '490013552S', 'naptan:Bearing': 'S', 'naptan:CommonName': 'The Shy Horse', 'naptan:Street': 'LEATHERHEAD ROAD', 'naptan:verified': 'no', 'public_transport': 'platform', 'shelter': 'yes'}, {'bus': 'yes', 'highway': 'bus_stop', 'name': 'Malden Rushett / Shy Horse', 'naptan:AtcoCode': '490013552N', 'naptan:Bearing': 'N', 'naptan:CommonName': 'The Shy Horse', 'naptan:Street': 'LEATHERHEAD ROAD', 'naptan:verified': 'no', 'public_transport': 'platform', 'shelter': 'yes'}, {'bus': 'yes', 'highway': 'bus_stop', 'name': 'Rushett Lane', 'naptan:AtcoCode': '490011796N', 'naptan:Bearing': 'N', 'naptan:CommonName': 'Rushett Lane', 'naptan:Street': 'LEATHERHEAD ROAD', 'naptan:verified': 'no', 'public_transport': 'platform', 'shelter': 'yes'}, {'bus': 'yes', 'highway': 'bus_stop', 'name': 'Rushett Lane', 'naptan:AtcoCode': '490011796S', 'naptan:Bearing': 'N', 'naptan:CommonName': 'Rushett Lane', 'naptan:Street': 'LEATHERHEAD ROAD', 'naptan:verified': 'no', 'public_transport': 'platform', 'shelter': 'yes'}, {'addr:city': 'Banstead', 'addr:postcode': 'SM7 1RD', 'addr:street': 'Banstead Road', 'description': '18 hole Par 71', 'golf:course': '18_hole', 'golf:par': '71', 'leisure': 'golf_course', 'name': 'Cuddington Golf Course', 'phone': '+44 20 8393 0952', 'website': 'https://www.cuddingtongc.co.uk/'}, {'bus': 'yes', 'highway': 'bus_stop', 'name': 'Lacey Drive', 'naptan:AtcoCode': '490008495E', 'naptan:Bearing': 'N', 'naptan:CommonName': 'Inwood Avenue', 'naptan:Street': 'LACEY DRIVE', 'naptan:verified': 'no', 'public_transport': 'platform', 'ref': '71478'}, {'bus': 'yes', 'highway': 'bus_stop', 'name': 'Lacey Drive', 'naptan:AtcoCode': '490008895W', 'naptan:Bearing': 'W', 'naptan:CommonName': 'Coulsdon Road Lacey Drive', 'naptan:Street': 'LACEY DRIVE', 'naptan:verified': 'no', 'public_transport': 'platform', 'ref': '76343'}, {'bus': 'yes', 'highway': 'bus_stop', 'name': 'Coulsdon Road / Lacey Drive', 'naptan:AtcoCode': '490008895N', 'naptan:Bearing': 'N', 'naptan:CommonName': 'Coulsdon Road Lacey Drive', 'naptan:Street': 'COULSDON ROAD', 'naptan:verified': 'yes', 'public_transport': 'platform', 'ref': '58016'}, {'bus': 'yes', 'highway': 'bus_stop', 'name': 'Coulsdon Road / Lacey Drive', 'naptan:AtcoCode': '490008895S', 'naptan:Bearing': 'N', 'naptan:CommonName': 'Coulsdon Road Lacey Drive', 'naptan:Street': 'COULSDON ROAD', 'naptan:verified': 'yes', 'public_transport': 'platform', 'ref': '55790'}], dtype=object)
print("All have exactly 10 tags:", (data["tags"].str.len() == 10).all())
All have exactly 10 tags: True
data = qosm.convert_geometry_to_geodataframe(
geometry_filter=qosm.geocode_to_geometry("Greater London"),
osm_extract_source="Geofabrik",
custom_sql_filter="id % 13 = 0 AND starts_with(id::STRING, '6')",
)
data
Finished operation in 0:00:28
tags | geometry | |
---|---|---|
feature_id | ||
way/660949419 | {'building': 'garages'} | POLYGON ((-0.31115 51.38904, -0.31111 51.389, ... |
way/660949406 | {'highway': 'service'} | LINESTRING (-0.30754 51.38989, -0.30779 51.39012) |
way/660949393 | {'building': 'house'} | POLYGON ((-0.30924 51.38903, -0.30889 51.38917... |
way/660949367 | {'building': 'house'} | POLYGON ((-0.30819 51.38834, -0.3079 51.38846,... |
way/660949380 | {'building': 'house'} | POLYGON ((-0.31093 51.38833, -0.31081 51.38822... |
... | ... | ... |
way/678267668 | {'designation': 'public_footpath', 'foot': 'ye... | LINESTRING (0.13985 51.37873, 0.13934 51.37886) |
way/678267395 | {'highway': 'track'} | LINESTRING (0.14271 51.37555, 0.14265 51.3754,... |
node/6350961409 | {'barrier': 'gate'} | POINT (0.13752 51.37232) |
node/6524837735 | {'barrier': 'gate'} | POINT (0.12551 51.37234) |
way/693062890 | {'highway': 'service'} | LINESTRING (0.14192 51.36726, 0.14206 51.36734... |
13503 rows × 2 columns
print("All starting with digit 6:", data.index.map(lambda x: x.split("/")[1].startswith("6")).all())
print("All divisible by 13:", data.index.map(lambda x: (int(x.split("/")[1]) % 13) == 0).all())
All starting with digit 6: True All divisible by 13: True
Find features that have all selected tags present¶
When using osm_tags_filter
with value { "building": True, "historic": True, "name": True }
, the result will contain every feature that have at least one of those tags.
Positive tags filters are combined using an OR
operator. You can read more about it here.
To get filters with AND
operator, the custom_sql_filter
parameter has to be used.
To match a list of keys against given values we have to use list-related functions.
More LIST
functions are available here.
data = qosm.convert_geometry_to_geodataframe(
geometry_filter=qosm.geocode_to_geometry("Greater London"),
osm_extract_source="Geofabrik",
custom_sql_filter="list_has_all(map_keys(tags), ['building', 'historic', 'name'])",
)
data
Finished operation in 0:00:29
tags | geometry | |
---|---|---|
feature_id | ||
way/34002538 | {'alt_name': 'Coronation Clock Tower', 'amenit... | POLYGON ((-0.30444 51.39428, -0.30443 51.39425... |
way/149186805 | {'addr:city': 'Wallington', 'addr:postcode': '... | POLYGON ((-0.15524 51.36893, -0.15524 51.3689,... |
way/1198803262 | {'addr:city': 'Worcester Park', 'addr:housenam... | POLYGON ((-0.25301 51.38469, -0.25298 51.38463... |
way/69960935 | {'building': 'yes', 'historic': 'yes', 'name':... | POLYGON ((-0.21662 51.35826, -0.21661 51.35822... |
way/69955115 | {'amenity': 'townhall', 'building': 'yes', 'hi... | POLYGON ((-0.21658 51.35898, -0.21638 51.359, ... |
... | ... | ... |
way/1297142053 | {'building': 'yes', 'historic': 'building', 'n... | POLYGON ((-0.08929 51.30816, -0.08928 51.30822... |
way/1297142052 | {'building': 'yes', 'historic': 'building', 'n... | POLYGON ((-0.09258 51.30978, -0.09271 51.30976... |
way/1297142045 | {'building': 'yes', 'historic': 'building', 'n... | POLYGON ((-0.09477 51.30946, -0.09489 51.30943... |
way/1050623081 | {'building': 'yes', 'historic': 'archaeologica... | POLYGON ((0.10786 51.38834, 0.10805 51.38828, ... |
way/144475030 | {'building': 'yes', 'historic': 'monument', 'n... | POLYGON ((0.10474 51.37903, 0.10481 51.379, 0.... |
368 rows × 2 columns
tags_names = ["name", "building", "historic"]
for tag_name in tags_names:
data[tag_name] = data["tags"].apply(lambda x, tag_name=tag_name: x.get(tag_name))
data[[*tags_names, "geometry"]].explore(tiles="CartoDB DarkMatter", color="orange")
data = qosm.convert_geometry_to_geodataframe(
geometry_filter=qosm.geocode_to_geometry("Greater London"),
osm_extract_source="Geofabrik",
custom_sql_filter="""
list_has_all(map_keys(tags), ['highway', 'name'])
AND regexp_matches(tags['name'], '^(New|Old)\s\w+')
""",
)
data
<>:6: SyntaxWarning: invalid escape sequence '\s' <>:6: SyntaxWarning: invalid escape sequence '\s' /tmp/ipykernel_13394/2937828388.py:6: SyntaxWarning: invalid escape sequence '\s' AND regexp_matches(tags['name'], '^(New|Old)\s\w+')
Finished operation in 0:00:29
tags | geometry | |
---|---|---|
feature_id | ||
node/469777178 | {'bench': 'yes', 'bin': 'yes', 'bus': 'yes', '... | POINT (-0.11993 51.30662) |
node/469774294 | {'bench': 'no', 'bin': 'no', 'bus': 'yes', 'hi... | POINT (-0.1286 51.30796) |
node/469774295 | {'bench': 'no', 'bin': 'no', 'bus': 'yes', 'hi... | POINT (-0.12839 51.30772) |
node/469777176 | {'bench': 'yes', 'bin': 'yes', 'bus': 'yes', '... | POINT (-0.12081 51.30735) |
node/469777177 | {'bench': 'yes', 'bus': 'yes', 'highway': 'bus... | POINT (-0.12044 51.30738) |
... | ... | ... |
way/1381693626 | {'hgv': 'discouraged', 'highway': 'unclassifie... | LINESTRING (0.08582 51.35122, 0.08639 51.35055... |
way/28468970 | {'hgv': 'discouraged', 'highway': 'unclassifie... | LINESTRING (0.08582 51.35122, 0.08492 51.35205... |
way/9043147 | {'highway': 'residential', 'name': 'New Road'} | LINESTRING (0.10048 51.38118, 0.1003 51.38097,... |
way/112394271 | {'highway': 'residential', 'name': 'Old Church... | LINESTRING (0.10718 51.378, 0.1072 51.37805, 0... |
way/36605623 | {'highway': 'residential', 'maxspeed': '30 mph... | LINESTRING (0.10889 51.37825, 0.10871 51.37828... |
1938 rows × 2 columns
ways_only = data[data.index.str.startswith("way/")]
ways_only["name"] = ways_only["tags"].apply(lambda x: x["name"])
ways_only["prefix"] = ways_only["name"].apply(lambda x: x.split()[0])
ways_only[["name", "prefix", "geometry"]].explore(
tiles="CartoDB DarkMatter", column="prefix", cmap=["orange", "royalblue"]
)
/opt/hostedtoolcache/Python/3.12.10/x64/lib/python3.12/site-packages/geopandas/geodataframe.py:1819: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy super().__setitem__(key, value) /opt/hostedtoolcache/Python/3.12.10/x64/lib/python3.12/site-packages/geopandas/geodataframe.py:1819: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy super().__setitem__(key, value)