Custom SQL filter¶
QuackOSM enables advanced users to filter data using SQL filters that will be used by DuckDB during processing.
The filter will be loaded alongside with OSM tags filters and features IDs filters.
SQL filter clause will can be passed both in Python API (as custom_sql_filter
parameter) and the CLI (as --custom-sql-filter
option).
Two columns available to users are: id
(type BIGINT
) and tags
(type: MAP(VARCHAR, VARCHAR)
).
You can look for available functions into a DuckDB documentation.
Below are few examples on how to use the custom SQL filters.
import quackosm as qosm
data = qosm.convert_geometry_to_geodataframe(
geometry_filter=qosm.geocode_to_geometry("Greater London"),
osm_extract_source="Geofabrik",
custom_sql_filter="cardinality(tags) = 10",
)
data["tags"].head(10).values
0%| | 0.00/107M [00:00<?, ?B/s]
0%| | 8.19k/107M [00:00<26:20, 68.0kB/s]
0%| | 36.9k/107M [00:00<11:13, 160kB/s]
0%| | 90.1k/107M [00:00<06:31, 274kB/s]
0%| | 197k/107M [00:00<03:39, 489kB/s]
0%|▏ | 414k/107M [00:00<01:57, 913kB/s]
1%|▎ | 844k/107M [00:00<01:01, 1.72MB/s]
2%|▌ | 1.71M/107M [00:00<00:31, 3.32MB/s]
3%|█▏ | 3.43M/107M [00:01<00:16, 6.46MB/s]
6%|██▏ | 6.09M/107M [00:01<00:09, 10.8MB/s]
8%|██▉ | 8.34M/107M [00:01<00:07, 12.7MB/s]
10%|███▊ | 10.6M/107M [00:01<00:06, 14.2MB/s]
12%|████▌ | 12.9M/107M [00:01<00:06, 15.1MB/s]
14%|█████▎ | 15.2M/107M [00:01<00:05, 15.8MB/s]
16%|██████▏ | 17.4M/107M [00:01<00:05, 16.0MB/s]
18%|██████▉ | 19.5M/107M [00:01<00:05, 16.2MB/s]
20%|███████▋ | 21.7M/107M [00:02<00:05, 16.3MB/s]
22%|████████▍ | 23.9M/107M [00:02<00:05, 16.4MB/s]
24%|█████████▏ | 26.1M/107M [00:02<00:04, 16.5MB/s]
26%|██████████ | 28.3M/107M [00:02<00:04, 16.7MB/s]
29%|██████████▊ | 30.6M/107M [00:02<00:04, 16.9MB/s]
31%|███████████▋ | 32.9M/107M [00:02<00:04, 17.1MB/s]
33%|████████████▍ | 35.2M/107M [00:02<00:04, 17.1MB/s]
35%|█████████████▏ | 37.4M/107M [00:03<00:04, 17.1MB/s]
37%|██████████████ | 39.7M/107M [00:03<00:03, 17.1MB/s]
39%|██████████████▊ | 41.9M/107M [00:03<00:03, 17.1MB/s]
41%|███████████████▌ | 44.1M/107M [00:03<00:03, 17.1MB/s]
43%|████████████████▍ | 46.4M/107M [00:03<00:03, 17.2MB/s]
45%|█████████████████▏ | 48.7M/107M [00:03<00:03, 17.2MB/s]
47%|██████████████████ | 50.9M/107M [00:03<00:03, 17.1MB/s]
49%|██████████████████▊ | 53.2M/107M [00:03<00:03, 17.2MB/s]
52%|███████████████████▌ | 55.4M/107M [00:04<00:03, 17.2MB/s]
54%|████████████████████▍ | 57.7M/107M [00:04<00:02, 17.2MB/s]
56%|█████████████████████▏ | 59.9M/107M [00:04<00:02, 17.2MB/s]
58%|█████████████████████▉ | 62.2M/107M [00:04<00:02, 17.1MB/s]
60%|██████████████████████▊ | 64.4M/107M [00:04<00:02, 17.1MB/s]
62%|███████████████████████▌ | 66.7M/107M [00:04<00:02, 17.1MB/s]
64%|████████████████████████▍ | 68.9M/107M [00:04<00:02, 17.2MB/s]
66%|█████████████████████████▏ | 71.2M/107M [00:04<00:02, 17.1MB/s]
68%|█████████████████████████▉ | 73.4M/107M [00:05<00:01, 17.1MB/s]
70%|██████████████████████████▋ | 75.6M/107M [00:05<00:01, 17.1MB/s]
72%|███████████████████████████▌ | 77.9M/107M [00:05<00:01, 17.1MB/s]
75%|████████████████████████████▎ | 80.1M/107M [00:05<00:01, 17.1MB/s]
77%|█████████████████████████████▏ | 82.4M/107M [00:05<00:01, 17.2MB/s]
79%|█████████████████████████████▉ | 84.7M/107M [00:05<00:01, 17.2MB/s]
81%|██████████████████████████████▋ | 86.9M/107M [00:05<00:01, 17.2MB/s]
83%|███████████████████████████████▌ | 89.2M/107M [00:06<00:01, 17.2MB/s]
85%|████████████████████████████████▎ | 91.4M/107M [00:06<00:00, 17.1MB/s]
87%|█████████████████████████████████ | 93.6M/107M [00:06<00:00, 17.0MB/s]
89%|█████████████████████████████████▉ | 95.8M/107M [00:06<00:00, 17.0MB/s]
91%|██████████████████████████████████▋ | 98.0M/107M [00:06<00:00, 17.0MB/s]
93%|████████████████████████████████████▍ | 100M/107M [00:06<00:00, 17.0MB/s]
95%|█████████████████████████████████████▏ | 103M/107M [00:06<00:00, 17.1MB/s]
97%|██████████████████████████████████████ | 105M/107M [00:06<00:00, 17.0MB/s]
100%|██████████████████████████████████████▊| 107M/107M [00:07<00:00, 17.1MB/s]
0%| | 0.00/107M [00:00<?, ?B/s]
100%|████████████████████████████████████████| 107M/107M [00:00<00:00, 190GB/s]
Finished operation in 0:00:28
array([{'bicycle': 'yes', 'button_operated': 'yes', 'crossing': 'traffic_signals', 'crossing:island': 'no', 'crossing:markings': 'dots', 'crossing:signals': 'yes', 'crossing_ref': 'toucan', 'highway': 'crossing', 'segregated': 'no', 'tactile_paving': 'yes'}, {'amenity': 'post_box', 'collection_times': 'Mo-Fr 09:00; Sa 07:00', 'drive_through': 'no', 'operator': 'Royal Mail', 'operator:wikidata': 'Q638098', 'post_box:design': 'type_a', 'post_box:type': 'pillar', 'ref': 'SE5 42D', 'royal_cypher': 'VR', 'royal_cypher:wikidata': 'Q33102356'}, {'brand': 'Co-op Food', 'brand:wikidata': 'Q3277439', 'check_date': '2024-07-15', 'check_date:opening_hours': '2024-07-15', 'name': 'Co-op Food', 'opening_hours': 'Mo-Su 07:00-22:00', 'operator': 'The Co-operative Group', 'operator:wikidata': 'Q117202', 'shop': 'supermarket', 'website': 'https://www.coop.co.uk/store-finder/SM5-3AG/42-44-high-street'}, {'amenity': 'post_box', 'brand': 'Royal Mail', 'brand:wikidata': 'Q638098', 'collection_times': 'Mo-Fr 09:00; Sa 07:00; Su off', 'mapillary': '299948391764915', 'operator': 'Royal Mail', 'operator:wikidata': 'Q638098', 'post_box:type': 'pillar', 'ref': 'TW8 209D', 'survey:date': '2021-03-24'}, {'addr:city': 'Northolt', 'addr:county': 'Middlesex', 'addr:housenumber': '19', 'addr:postcode': 'UB5 4RR', 'addr:street': 'Oldfields Circus', 'amenity': 'cafe', 'fhrs:id': '1034156', 'name': 'Café Express', 'source:addr': 'FHRS Open Data', 'wheelchair': 'limited'}, {'addr:city': 'London', 'addr:housenumber': '1a', 'addr:postcode': 'SW20 0LL', 'addr:street': 'Amity Grove', 'amenity': 'post_office', 'name': 'Raynes Park', 'opening_hours': 'Mo-Fr 09:00-17:30; Tu 09:30-17:30; Sa 09:00-12:30; Su,PH off', 'ref:pol_id': '17960', 'website': 'https://www.postoffice.co.uk/branch-finder', 'wheelchair': 'yes'}, {'amenity': 'post_box', 'collection_times': 'Mo-Fr 16:15; Sa 11:45', 'colour': 'gold', 'operator': 'Royal Mail', 'operator:wikidata': 'Q638098', 'post_box:apertures': '2', 'post_box:type': 'pillar', 'ref': 'SM3 215;SM3 315', 'royal_cypher': 'EIIR', 'royal_cypher:wikidata': 'Q33102113'}, {'amenity': 'post_box', 'brand': 'Royal Mail', 'brand:wikidata': 'Q638098', 'brand:wikipedia': 'en:Royal Mail', 'operator': 'Royal Mail', 'operator:wikidata': 'Q638098', 'post_box:type': 'pillar', 'ref': 'SW20 39', 'royal_cypher': 'GR', 'royal_cypher:wikidata': 'Q33102273'}, {'amenity': 'post_box', 'brand': 'Royal Mail', 'brand:wikidata': 'Q638098', 'brand:wikipedia': 'en:Royal Mail', 'operator': 'Royal Mail', 'operator:wikidata': 'Q638098', 'post_box:type': 'pillar', 'ref': 'SW20 21', 'royal_cypher': 'GR', 'royal_cypher:wikidata': 'Q33102273'}, {'addr:city': 'Carshalton', 'addr:housenumber': '213', 'addr:postcode': 'SM5 3PZ', 'addr:street': 'Carshalton Road', 'brand': 'Best-one', 'brand:wikidata': 'Q4896532', 'fhrs:authority': 'Sutton', 'fhrs:id': '898841', 'name': 'Best-one', 'shop': 'convenience'}], dtype=object)
print("All have exactly 10 tags:", (data["tags"].str.len() == 10).all())
All have exactly 10 tags: True
data = qosm.convert_geometry_to_geodataframe(
geometry_filter=qosm.geocode_to_geometry("Greater London"),
osm_extract_source="Geofabrik",
custom_sql_filter="id % 13 = 0 AND starts_with(id::STRING, '6')",
)
data
Finished operation in 0:00:25
tags | geometry | |
---|---|---|
feature_id | ||
node/69140552 | {'crossing': 'uncontrolled', 'crossing_ref': '... | POINT (-0.18262 51.36975) |
node/6345933555 | {'addr:city': 'London', 'addr:housenumber': '8... | POINT (-0.09397 51.63533) |
node/6346059993 | {'addr:housenumber': '82', 'addr:street': 'Que... | POINT (-0.18748 51.51285) |
node/6346205528 | {'access': 'private', 'barrier': 'gate'} | POINT (-0.23552 51.40312) |
node/6346205567 | {'addr:city': 'London', 'addr:housenumber': '2... | POINT (-0.23494 51.40513) |
... | ... | ... |
relation/6679699 | {'addr:city': 'London', 'addr:flats': '57-70',... | POLYGON ((-0.13858 51.49571, -0.13847 51.49575... |
relation/6731491 | {'building': 'yes', 'name': 'Thames Wharf Stud... | MULTIPOLYGON (((-0.22461 51.4838, -0.22412 51.... |
relation/6794970 | {'addr:city': 'London', 'addr:postcode': 'N12 ... | POLYGON ((-0.17971 51.61437, -0.17948 51.61434... |
relation/6922968 | {'addr:housenumber': '14', 'addr:street': 'Ors... | POLYGON ((-0.07966 51.53644, -0.0798 51.53645,... |
relation/6981767 | {'bicycle': 'no', 'foot': 'private', 'highway'... | POLYGON ((-0.02311 51.48131, -0.02311 51.48132... |
13533 rows × 2 columns
print("All starting with digit 6:", data.index.map(lambda x: x.split("/")[1].startswith("6")).all())
print("All divisible by 13:", data.index.map(lambda x: (int(x.split("/")[1]) % 13) == 0).all())
All starting with digit 6: True All divisible by 13: True
Find features that have all selected tags present¶
When using osm_tags_filter
with value { "building": True, "historic": True, "name": True }
, the result will contain every feature that have at least one of those tags.
Positive tags filters are combined using an OR
operator. You can read more about it here.
To get filters with AND
operator, the custom_sql_filter
parameter has to be used.
To match a list of keys against given values we have to use list-related functions.
More LIST
functions are available here.
data = qosm.convert_geometry_to_geodataframe(
geometry_filter=qosm.geocode_to_geometry("Greater London"),
osm_extract_source="Geofabrik",
custom_sql_filter="list_has_all(map_keys(tags), ['building', 'historic', 'name'])",
)
data
Finished operation in 0:00:27
tags | geometry | |
---|---|---|
feature_id | ||
node/11899012453 | {'building': 'yes', 'historic': 'archaeologica... | POINT (-0.12403 51.50456) |
way/183993058 | {'addr:city': 'London', 'addr:housenumber': '1... | POLYGON ((-0.13995 51.50452, -0.14024 51.50484... |
way/204064503 | {'addr:city': 'London', 'addr:postcode': 'WC2R... | POLYGON ((-0.11288 51.51056, -0.11286 51.51056... |
way/204148499 | {'addr:city': 'London', 'addr:housename': 'Pro... | POLYGON ((-0.05124 51.50714, -0.05112 51.50698... |
way/204334615 | {'addr:city': 'London', 'addr:postcode': 'SE10... | POLYGON ((0.00211 51.49681, 0.0021 51.4968, 0.... |
... | ... | ... |
relation/3609581 | {'addr:city': 'London', 'building': 'yes', 'ca... | POLYGON ((-0.21571 51.47035, -0.21583 51.47033... |
relation/3962876 | {'building': 'yes', 'height': '5', 'historic':... | POLYGON ((-0.1287 51.50819, -0.12874 51.50818,... |
relation/3962878 | {'artist_name': 'Francis Leggatt Chantrey', 'a... | POLYGON ((-0.1276 51.50838, -0.12757 51.50838,... |
relation/5208404 | {'access': 'permit', 'addr:city': 'London', 'a... | POLYGON ((-0.1422 51.50176, -0.14215 51.50171,... |
relation/6036811 | {'addr:city': 'London', 'addr:postcode': 'SW1V... | POLYGON ((-0.14476 51.4959, -0.1447 51.49588, ... |
361 rows × 2 columns
tags_names = ["name", "building", "historic"]
for tag_name in tags_names:
data[tag_name] = data["tags"].apply(lambda x, tag_name=tag_name: x.get(tag_name))
data[[*tags_names, "geometry"]].explore(tiles="CartoDB DarkMatter", color="orange")
data = qosm.convert_geometry_to_geodataframe(
geometry_filter=qosm.geocode_to_geometry("Greater London"),
osm_extract_source="Geofabrik",
custom_sql_filter="""
list_has_all(map_keys(tags), ['highway', 'name'])
AND regexp_matches(tags['name'][1], '^(New|Old)\s\w+')
""",
)
data
<>:4: SyntaxWarning: invalid escape sequence '\s' <>:4: SyntaxWarning: invalid escape sequence '\s' /tmp/ipykernel_16674/1700218434.py:4: SyntaxWarning: invalid escape sequence '\s' custom_sql_filter="""
Finished operation in 0:00:28
tags | geometry | |
---|---|---|
feature_id | ||
node/4663258181 | {'bench': 'yes', 'bin': 'yes', 'bus': 'yes', '... | POINT (-0.04845 51.47403) |
node/5827407188 | {'bus': 'yes', 'highway': 'bus_stop', 'local_r... | POINT (-0.04261 51.47672) |
way/118764502 | {'highway': 'tertiary', 'lit': 'yes', 'maxspee... | LINESTRING (0.03933 51.52455, 0.03925 51.52455... |
way/139442476 | {'alt_name': 'New Kings Road', 'cycleway': 'la... | LINESTRING (-0.20741 51.47083, -0.20729 51.470... |
way/139732196 | {'cycleway:left': 'shared', 'cycleway:right': ... | LINESTRING (-0.12601 51.49909, -0.12592 51.49884) |
... | ... | ... |
way/995542664 | {'foot': 'yes', 'highway': 'secondary', 'lanes... | LINESTRING (0.02967 51.47551, 0.02971 51.47549... |
way/993721223 | {'busway:right': 'lane', 'cycleway:left': 'sep... | LINESTRING (-0.0862 51.526, -0.08651 51.52592) |
way/995542654 | {'highway': 'secondary', 'lanes': '2', 'lit': ... | LINESTRING (0.02959 51.47552, 0.02967 51.47551) |
way/995542662 | {'highway': 'secondary', 'lanes': '1', 'lit': ... | LINESTRING (0.02967 51.47551, 0.02974 51.4755,... |
way/1011318471 | {'bicycle': 'yes', 'highway': 'pedestrian', 'l... | LINESTRING (-0.18968 51.45961, -0.18981 51.45956) |
1902 rows × 2 columns
ways_only = data[data.index.str.startswith("way/")]
ways_only["name"] = ways_only["tags"].apply(lambda x: x["name"])
ways_only["prefix"] = ways_only["name"].apply(lambda x: x.split()[0])
ways_only[["name", "prefix", "geometry"]].explore(
tiles="CartoDB DarkMatter", column="prefix", cmap=["orange", "royalblue"]
)
/home/runner/.local/lib/python3.12/site-packages/geopandas/geodataframe.py:1819: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy super().__setitem__(key, value) /home/runner/.local/lib/python3.12/site-packages/geopandas/geodataframe.py:1819: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy super().__setitem__(key, value)