Custom SQL filter¶
QuackOSM enables advanced users to filter data using SQL filters that will be used by DuckDB during processing.
The filter will be loaded alongside with OSM tags filters and features IDs filters.
SQL filter clause will can be passed both in Python API (as custom_sql_filter
parameter) and the CLI (as --custom-sql-filter
option).
Two columns available to users are: id
(type BIGINT
) and tags
(type: MAP(VARCHAR, VARCHAR)
).
You can look for available functions into a DuckDB documentation.
Below are few examples on how to use the custom SQL filters.
import quackosm as qosm
data = qosm.convert_geometry_to_geodataframe(
geometry_filter=qosm.geocode_to_geometry("Greater London"),
osm_extract_source="Geofabrik",
custom_sql_filter="cardinality(tags) = 10",
)
data["tags"].head(10).values
array([{'crossing': 'traffic_signals', 'crossing:island': 'no', 'crossing:markings': 'dots', 'crossing:signals': 'yes', 'highway': 'crossing', 'kerb': 'flush', 'ref:GB:tflcid': 'RWG079311', 'surface': 'asphalt', 'tactile_paving': 'yes', 'traffic_calming': 'table'}, {'crossing': 'uncontrolled', 'crossing:island': 'yes', 'crossing:markings': 'zebra;dots', 'crossing_ref': 'zebra', 'highway': 'crossing', 'kerb': 'flush', 'ref:GB:tflcid': 'RWG079028', 'surface': 'asphalt', 'tactile_paving': 'yes', 'traffic_calming': 'table'}, {'amenity': 'post_box', 'collection_times': 'Mo-Fr 17:30; Sa 12:00; Su off', 'drive_through': 'no', 'operator': 'Royal Mail', 'operator:wikidata': 'Q638098', 'post_box:design': 'type_a', 'post_box:type': 'pillar', 'ref': 'SE16 23', 'royal_cypher': 'VR', 'royal_cypher:wikidata': 'Q33102356'}, {'bus': 'yes', 'highway': 'bus_stop', 'local_ref': 'E', 'name': 'The Grange', 'naptan:AtcoCode': '490017366W', 'naptan:Bearing': 'W', 'naptan:CommonName': 'The Grange', 'naptan:Indicator': 'Stop E', 'public_transport': 'platform', 'ref': '74052'}, {'amenity': 'post_box', 'collection_times': 'Mo-Fr 17:30; Sa 12:00', 'drive_through': 'no', 'operator': 'Royal Mail', 'operator:wikidata': 'Q638098', 'post_box:design': 'type_a', 'post_box:type': 'pillar', 'ref': 'SE22 11', 'royal_cypher': 'VR', 'royal_cypher:wikidata': 'Q33102356'}, {'amenity': 'post_box', 'collection_times': 'Mo-Fr 09:00; Sa 07:00', 'drive_through': 'no', 'operator': 'Royal Mail', 'operator:wikidata': 'Q638098', 'post_box:design': 'type_a', 'post_box:type': 'pillar', 'ref': 'SE21 24D', 'royal_cypher': 'EIIR', 'royal_cypher:wikidata': 'Q33102113'}, {'amenity': 'post_box', 'collection_times': 'Mo-Fr 17:30; Sa 12:00', 'drive_through': 'no', 'operator': 'Royal Mail', 'operator:wikidata': 'Q638098', 'post_box:design': 'type_b', 'post_box:type': 'pillar', 'ref': 'SE22 31', 'royal_cypher': 'VR', 'royal_cypher:wikidata': 'Q33102356'}, {'addr:city': 'Surbiton', 'addr:housenumber': '5', 'addr:postcode': 'KT5 8SA', 'addr:street': 'Howard Road', 'disused': 'yes', 'name': 'Angel Inn', 'rooms': '8', 'toilets': 'yes', 'toilets:access': 'customers', 'tourism': 'guest_house'}, {'amenity': 'post_box', 'brand': 'Royal Mail', 'brand:wikidata': 'Q638098', 'collection_times': 'Mo-Fr 16:15, Sa 11:15', 'colour': 'red', 'operator': 'Royal Mail', 'operator:wikidata': 'Q638098', 'ref': 'TW7 274', 'royal_cypher': 'GR', 'royal_cypher:wikidata': 'Q33102273'}, {'amenity': 'post_box', 'collection_times': 'Mo-Fr 17:30; Sa 12:00; Su off', 'operator': 'Royal Mail', 'operator:wikidata': 'Q638098', 'post_box:apertures': '2', 'post_box:design': 'type_c', 'post_box:type': 'pillar', 'ref': 'SE16 21;SE16 121', 'royal_cypher': 'EIIR', 'royal_cypher:wikidata': 'Q33102113'}], dtype=object)
print("All have exactly 10 tags:", (data["tags"].str.len() == 10).all())
All have exactly 10 tags: True
data = qosm.convert_geometry_to_geodataframe(
geometry_filter=qosm.geocode_to_geometry("Greater London"),
osm_extract_source="Geofabrik",
custom_sql_filter="id % 13 = 0 AND starts_with(id::STRING, '6')",
)
data
tags | geometry | |
---|---|---|
feature_id | ||
node/60126924 | {'highway': 'crossing'} | POINT (-0.01467 51.38001) |
node/60240570 | {'height': '65.5', 'man_made': 'tower', 'name'... | POINT (-0.12917 51.5945) |
node/60241818 | {'noexit': 'yes'} | POINT (-0.1189 51.58996) |
node/60258094 | {'waterway': 'lock_gate'} | POINT (-0.49473 51.62545) |
node/6037626283 | {'bus': 'yes', 'local_ref': '0-1', 'name': 'Vi... | POINT (-0.14883 51.49212) |
... | ... | ... |
way/694514431 | {'addr:city': 'London', 'addr:country': 'GB', ... | POLYGON ((-0.01451 51.60812, -0.01464 51.60812... |
way/694514444 | {'addr:city': 'London', 'addr:country': 'GB', ... | POLYGON ((-0.01545 51.60808, -0.01558 51.60807... |
way/694514457 | {'addr:city': 'London', 'addr:country': 'GB', ... | POLYGON ((-0.01037 51.60894, -0.01029 51.60895... |
way/694514470 | {'addr:city': 'London', 'addr:country': 'GB', ... | POLYGON ((-0.01082 51.60891, -0.01074 51.60892... |
way/694514483 | {'addr:city': 'London', 'addr:country': 'GB', ... | POLYGON ((-0.01199 51.60903, -0.0121 51.60906,... |
13533 rows × 2 columns
print("All starting with digit 6:", data.index.map(lambda x: x.split("/")[1].startswith("6")).all())
print("All divisible by 13:", data.index.map(lambda x: (int(x.split("/")[1]) % 13) == 0).all())
All starting with digit 6: True All divisible by 13: True
Find features that have all selected tags present¶
When using osm_tags_filter
with value { "building": True, "historic": True, "name": True }
, the result will contain every feature that have at least one of those tags.
Positive tags filters are combined using an OR
operator. You can read more about it here.
To get filters with AND
operator, the custom_sql_filter
parameter has to be used.
To match a list of keys against given values we have to use list-related functions.
More LIST
functions are available here.
data = qosm.convert_geometry_to_geodataframe(
geometry_filter=qosm.geocode_to_geometry("Greater London"),
osm_extract_source="Geofabrik",
custom_sql_filter="list_has_all(map_keys(tags), ['building', 'historic', 'name'])",
)
data
tags | geometry | |
---|---|---|
feature_id | ||
node/5006866007 | {'building': 'yes', 'historic': 'yes', 'name':... | POINT (-0.30881 51.46083) |
way/183993058 | {'addr:city': 'London', 'addr:housenumber': '1... | POLYGON ((-0.13995 51.50452, -0.14024 51.50484... |
way/192549089 | {'building': 'yes', 'building:levels': '6', 'h... | POLYGON ((-0.03491 51.51196, -0.03493 51.51197... |
way/196379420 | {'addr:city': 'London', 'addr:housenumber': '4... | POLYGON ((0.04267 51.50271, 0.04274 51.5028, 0... |
way/208705632 | {'building': 'yes', 'he:inscription_date': '19... | POLYGON ((-0.05142 51.50746, -0.05139 51.50753... |
... | ... | ... |
relation/3609581 | {'addr:city': 'London', 'building': 'yes', 'ca... | POLYGON ((-0.21571 51.47035, -0.21583 51.47033... |
relation/3962876 | {'building': 'yes', 'height': '5', 'historic':... | POLYGON ((-0.1287 51.50819, -0.12874 51.50818,... |
relation/3962878 | {'artist_name': 'Francis Leggatt Chantrey', 'a... | POLYGON ((-0.1276 51.50838, -0.12757 51.50838,... |
relation/5208404 | {'access': 'permit', 'addr:city': 'London', 'a... | POLYGON ((-0.1422 51.50176, -0.14215 51.50171,... |
relation/6036811 | {'addr:city': 'London', 'addr:postcode': 'SW1V... | POLYGON ((-0.14476 51.4959, -0.1447 51.49588, ... |
361 rows × 2 columns
tags_names = ["name", "building", "historic"]
for tag_name in tags_names:
data[tag_name] = data["tags"].apply(lambda x, tag_name=tag_name: x.get(tag_name))
data[[*tags_names, "geometry"]].explore(tiles="CartoDB DarkMatter", color="orange")
data = qosm.convert_geometry_to_geodataframe(
geometry_filter=qosm.geocode_to_geometry("Greater London"),
osm_extract_source="Geofabrik",
custom_sql_filter="""
list_has_all(map_keys(tags), ['highway', 'name'])
AND regexp_matches(tags['name'], '^(New|Old)\s\w+')
""",
)
data
Finished operation in 0:01:55
tags | geometry | |
---|---|---|
feature_id | ||
node/1177293454 | {'bench': 'yes', 'bus': 'yes', 'covered': 'yes... | POINT (-0.48138 51.52238) |
node/3862617446 | {'bench': 'yes', 'bus': 'yes', 'highway': 'bus... | POINT (-0.06367 51.51995) |
node/5827407188 | {'bus': 'yes', 'highway': 'bus_stop', 'local_r... | POINT (-0.04261 51.47672) |
node/12731582 | {'highway': 'motorway_junction', 'name': 'Old ... | POINT (-0.0191 51.53192) |
node/469787416 | {'highway': 'bus_stop', 'local_ref': 'T', 'nam... | POINT (-0.39333 51.53759) |
... | ... | ... |
way/1074642426 | {'cycleway': 'sidewalk', 'foot': 'yes', 'highw... | LINESTRING (-0.08782 51.49405, -0.0883 51.4940... |
way/1074151205 | {'cycleway:right': 'lane', 'highway': 'primary... | LINESTRING (-0.25028 51.52862, -0.25009 51.52879) |
way/1304745584 | {'cycleway': 'opposite', 'highway': 'unclassif... | LINESTRING (-0.11937 51.51983, -0.11943 51.5199) |
way/1323879685 | {'highway': 'residential', 'lit': 'yes', 'maxs... | LINESTRING (-0.12817 51.51454, -0.12819 51.514... |
relation/17871649 | {'access': 'permissive', 'highway': 'pedestria... | POLYGON ((-0.10748 51.51571, -0.10749 51.51568... |
1902 rows × 2 columns