Custom SQL filter¶
QuackOSM enables advanced users to filter data using SQL filters that will be used by DuckDB during processing.
The filter will be loaded alongside with OSM tags filters and features IDs filters.
SQL filter clause will can be passed both in Python API (as custom_sql_filter
parameter) and the CLI (as --custom-sql-filter
option).
Two columns available to users are: id
(type BIGINT
) and tags
(type: MAP(VARCHAR, VARCHAR)
).
You can look for available functions into a DuckDB documentation.
Below are few examples on how to use the custom SQL filters.
import quackosm as qosm
data = qosm.convert_geometry_to_geodataframe(
geometry_filter=qosm.geocode_to_geometry("Greater London"),
osm_extract_source="Geofabrik",
custom_sql_filter="cardinality(tags) = 10",
)
data["tags"].head(10).values
array([{'button_operated': 'yes', 'crossing': 'traffic_signals', 'crossing:island': 'no', 'crossing:markings': 'dots', 'crossing_ref': 'pelican', 'highway': 'crossing', 'kerb': 'flush', 'tactile_paving': 'yes', 'traffic_calming': 'no', 'traffic_signals:sound': 'yes'}, {'amenity': 'post_box', 'collection_times': 'Mo-Fr 18:00, Sa 12:00', 'operator': 'Royal Mail', 'operator:wikidata': 'Q638098', 'post_box:apertures': '2', 'post_box:design': 'type_c', 'post_box:type': 'pillar', 'ref': 'KT1 12;KT1 1201', 'royal_cypher': 'GR', 'royal_cypher:wikidata': 'Q33102273'}, {'layer': '-1', 'light_rail': 'yes', 'name': 'Lewisham Platform 6', 'network': 'Docklands Light Railway', 'operator': 'KeolisAmey Docklands Ltd', 'public_transport': 'stop_position', 'railway': 'stop', 'ref': '6', 'source_ref': 'http://en.wikipedia.org/wiki/List_of_Docklands_Light_Railway_stations', 'wheelchair': 'yes'}, {'name': 'Tottenham Court Road', 'network': 'London Underground', 'operator': 'London Underground', 'public_transport': 'stop_position', 'railway': 'stop', 'ref': '3', 'subway': 'yes', 'wikidata': 'Q1431094', 'wikimedia_commons': 'Category:Tottenham_Court_Road_tube_station', 'wikipedia': 'en:Tottenham Court Road station'}, {'crossing': 'uncontrolled', 'crossing:island': 'no', 'crossing:markings': 'zebra', 'crossing_ref': 'zebra', 'flashing_lights': 'yes', 'highway': 'crossing', 'kerb': 'flush', 'name': 'Cyril Payne Crossing', 'tactile_paving': 'yes', 'traffic_calming': 'table'}, {'button_operated': 'yes', 'crossing': 'traffic_signals', 'crossing:island': 'no', 'crossing:markings': 'dots', 'crossing:signals': 'yes', 'crossing_ref': 'pelican', 'highway': 'crossing', 'tactile_paving': 'yes', 'traffic_signals:sound': 'no', 'traffic_signals:vibration': 'yes'}, {'amenity': 'post_box', 'collection_times': 'Mo-Fr 17:30; Sa 10:30', 'drive_through': 'no', 'operator': 'Royal Mail', 'operator:wikidata': 'Q638098', 'post_box:design': 'type_b', 'post_box:type': 'pillar', 'ref': 'SE19 30', 'royal_cypher': 'GR', 'royal_cypher:wikidata': 'Q33102273'}, {'amenity': 'post_box', 'collection_times': 'Mo-Fr 17:30; Sa 10:30', 'drive_through': 'no', 'operator': 'Royal Mail', 'operator:wikidata': 'Q638098', 'post_box:design': 'type_b', 'post_box:type': 'pillar', 'ref': 'SE19 29', 'royal_cypher': 'GR', 'royal_cypher:wikidata': 'Q33102273'}, {'amenity': 'post_box', 'collection_times': 'Mo-Fr 17:30; Sa 11:30; Su off', 'drive_through': 'no', 'operator': 'Royal Mail', 'operator:wikidata': 'Q638098', 'post_box:design': 'type_a', 'post_box:type': 'pillar', 'ref': 'SE19 4', 'royal_cypher': 'EIIR', 'royal_cypher:wikidata': 'Q33102113'}, {'amenity': 'post_box', 'collection_times': 'Mo-Fr 17:30; Sa 12:00', 'drive_through': 'no', 'operator': 'Royal Mail', 'operator:wikidata': 'Q638098', 'post_box:design': 'type_a', 'post_box:type': 'pillar', 'ref': 'SE19 14', 'royal_cypher': 'EIIR', 'royal_cypher:wikidata': 'Q33102113'}], dtype=object)
print("All have exactly 10 tags:", (data["tags"].str.len() == 10).all())
All have exactly 10 tags: True
data = qosm.convert_geometry_to_geodataframe(
geometry_filter=qosm.geocode_to_geometry("Greater London"),
osm_extract_source="Geofabrik",
custom_sql_filter="id % 13 = 0 AND starts_with(id::STRING, '6')",
)
data
tags | geometry | |
---|---|---|
feature_id | ||
node/69140552 | {'crossing': 'uncontrolled', 'crossing_ref': '... | POINT (-0.18262 51.36975) |
node/6434537096 | {'traffic_calming': 'cushion'} | POINT (-0.2232 51.39937) |
node/6435293527 | {'amenity': 'post_box', 'brand': 'Royal Mail',... | POINT (0.1138 51.37088) |
node/6435762515 | {'denotation': 'agricultural', 'leaf_cycle': '... | POINT (-0.38918 51.44851) |
node/6436237769 | {'barrier': 'gate'} | POINT (-0.04248 51.52343) |
... | ... | ... |
way/697288202 | {'addr:city': 'New Malden', 'addr:housenumber'... | POLYGON ((-0.23748 51.39789, -0.23745 51.39794... |
way/697288215 | {'addr:city': 'New Malden', 'addr:housenumber'... | POLYGON ((-0.23716 51.3986, -0.23714 51.39865,... |
way/697288228 | {'addr:city': 'New Malden', 'addr:housenumber'... | POLYGON ((-0.23682 51.39943, -0.23679 51.39948... |
way/697300968 | {'landuse': 'grass'} | POLYGON ((-0.05488 51.51475, -0.05421 51.51473... |
way/697404487 | {'addr:city': 'New Malden', 'addr:housenumber'... | POLYGON ((-0.23687 51.39756, -0.2368 51.3976, ... |
13533 rows × 2 columns
print("All starting with digit 6:", data.index.map(lambda x: x.split("/")[1].startswith("6")).all())
print("All divisible by 13:", data.index.map(lambda x: (int(x.split("/")[1]) % 13) == 0).all())
All starting with digit 6: True All divisible by 13: True
Find features that have all selected tags present¶
When using osm_tags_filter
with value { "building": True, "historic": True, "name": True }
, the result will contain every feature that have at least one of those tags.
Positive tags filters are combined using an OR
operator. You can read more about it here.
To get filters with AND
operator, the custom_sql_filter
parameter has to be used.
To match a list of keys against given values we have to use list-related functions.
More LIST
functions are available here.
data = qosm.convert_geometry_to_geodataframe(
geometry_filter=qosm.geocode_to_geometry("Greater London"),
osm_extract_source="Geofabrik",
custom_sql_filter="list_has_all(map_keys(tags), ['building', 'historic', 'name'])",
)
data
tags | geometry | |
---|---|---|
feature_id | ||
node/5006866007 | {'building': 'yes', 'historic': 'yes', 'name':... | POINT (-0.30881 51.46083) |
way/204064503 | {'addr:city': 'London', 'addr:postcode': 'WC2R... | POLYGON ((-0.11288 51.51056, -0.11286 51.51056... |
way/204148499 | {'addr:city': 'London', 'addr:housename': 'Pro... | POLYGON ((-0.05124 51.50714, -0.05112 51.50698... |
way/204334615 | {'addr:city': 'London', 'addr:postcode': 'SE10... | POLYGON ((0.00211 51.49681, 0.0021 51.4968, 0.... |
way/204906727 | {'addr:city': 'Perivale', 'addr:street': 'Peri... | POLYGON ((-0.32302 51.53194, -0.32301 51.53188... |
... | ... | ... |
relation/3962876 | {'building': 'yes', 'height': '5', 'historic':... | POLYGON ((-0.1287 51.50819, -0.12874 51.50818,... |
relation/3962878 | {'artist_name': 'Francis Leggatt Chantrey', 'a... | POLYGON ((-0.1276 51.50838, -0.12757 51.50838,... |
relation/5208404 | {'access': 'permit', 'addr:city': 'London', 'a... | POLYGON ((-0.1422 51.50176, -0.14215 51.50171,... |
relation/6036811 | {'addr:city': 'London', 'addr:postcode': 'SW1V... | POLYGON ((-0.14476 51.4959, -0.1447 51.49588, ... |
node/7731913773 | {'access': 'private', 'building': 'residential... | POINT (-0.46994 51.59564) |
361 rows × 2 columns
tags_names = ["name", "building", "historic"]
for tag_name in tags_names:
data[tag_name] = data["tags"].apply(lambda x, tag_name=tag_name: x.get(tag_name))
data[[*tags_names, "geometry"]].explore(tiles="CartoDB DarkMatter", color="orange")
data = qosm.convert_geometry_to_geodataframe(
geometry_filter=qosm.geocode_to_geometry("Greater London"),
osm_extract_source="Geofabrik",
custom_sql_filter="""
list_has_all(map_keys(tags), ['highway', 'name'])
AND regexp_matches(tags['name'][1], '^(New|Old)\s\w+')
""",
)
data
tags | geometry | |
---|---|---|
feature_id | ||
node/469777911 | {'bus': 'yes', 'highway': 'bus_stop', 'local_r... | POINT (-0.01813 51.34288) |
node/469777912 | {'bus': 'yes', 'highway': 'bus_stop', 'local_r... | POINT (-0.01727 51.34301) |
node/469778292 | {'bus': 'yes', 'highway': 'bus_stop', 'local_r... | POINT (0.08189 51.48132) |
node/469780168 | {'bench': 'yes', 'bus': 'yes', 'highway': 'bus... | POINT (-0.06248 51.34367) |
node/469780169 | {'bench': 'yes', 'bus': 'yes', 'highway': 'bus... | POINT (-0.06222 51.34367) |
... | ... | ... |
way/1144322583 | {'highway': 'residential', 'lit': 'yes', 'maxs... | LINESTRING (-0.0055 51.42549, -0.00547 51.4254... |
way/1144621888 | {'footway': 'sidewalk', 'highway': 'footway', ... | LINESTRING (-0.00536 51.42549, -0.00539 51.425... |
way/1146156714 | {'cycleway:left': 'no', 'cycleway:right': 'lan... | LINESTRING (-0.09523 51.52437, -0.09534 51.52435) |
way/1159357379 | {'footway': 'sidewalk', 'highway': 'footway', ... | LINESTRING (-0.0017 51.48401, -0.00177 51.48399) |
way/1323879685 | {'highway': 'residential', 'lit': 'yes', 'maxs... | LINESTRING (-0.12817 51.51454, -0.12819 51.514... |
1902 rows × 2 columns
ways_only = data[data.index.str.startswith("way/")]
ways_only["name"] = ways_only["tags"].apply(lambda x: x["name"])
ways_only["prefix"] = ways_only["name"].apply(lambda x: x.split()[0])
ways_only[["name", "prefix", "geometry"]].explore(
tiles="CartoDB DarkMatter", column="prefix", cmap=["orange", "royalblue"]
)
/root/development/quackosm/.venv/lib/python3.10/site-packages/geopandas/geodataframe.py:1819: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy super().__setitem__(key, value) /root/development/quackosm/.venv/lib/python3.10/site-packages/geopandas/geodataframe.py:1819: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy super().__setitem__(key, value)