-
Notifications
You must be signed in to change notification settings - Fork 39
Description
Hi,
I am currently trying to figure out a way to use pygeofilter for converting CQL2 to a DuckDB query using the spatial extension. This extension is quite prototypical and I stumble across a couple of caveats. Here's my current approach:
import duckdb
duckdb.install_extension('spatial')
duckdb.load_extension('spatial')Define the CQL2 filter
cql2_filter = {
"op": "and",
"args": [
{
"op": ">=",
"args": [
{
"property": "end_datetime"
},
'2020-03-28T20:05:46+02'
]
},
{
"op": "<=",
"args": [
{
"property": "start_datetime"
},
'2020-03-28T22:06:15+02'
]
},
{
"op": "=",
"args": [
{
"property": "sar:instrument_mode"
},
'IW'
]
}
]
}Optionally add spatial filtering
# ext = None
ext = {'xmin': -4, 'xmax': -2, 'ymin': 6, 'ymax': 8}
if ext is not None:
arg = {
'op': 's_intersects',
'args': [
{
'property': 'geometry'
},
{
'type': 'Polygon',
'coordinates': [[[ext['xmin'], ext['ymin']],
[ext['xmin'], ext['ymax']],
[ext['xmax'], ext['ymax']],
[ext['xmax'], ext['ymin']],
[ext['xmin'], ext['ymin']]]]
}
]
}
cql2_filter['args'].append(arg)Convert CQL2 filter to SQL where clause
from pygeofilter.parsers.cql2_json import parse as json_parse
filter = json_parse(cql2_filter)from pygeofilter.backends.sql.evaluate import to_sql_where
sql_where = to_sql_where(filter, {
's1:datatake': 's1:datatake',
'datetime': 'datetime',
'sar:instrument_mode': 'sar:instrument_mode',
'end_datetime': 'end_datetime',
'start_datetime': 'start_datetime',
'geometry': 'geometry'
})Create DuckDB query for a geoparquet file:
Here it gets ugly because (1) the WKB column needs to be converted to GEOMETRY type and (2) the DuckDB-spatial implementation of ST_GeomFromWKB cannot read the WKB-HEX representation returned by to_sql_where.
import re
sql_query = "SELECT * EXCLUDE geometry, ST_GeomFromWKB(geometry) AS geometry FROM '20200301_20200401.parquet' WHERE %s" % sql_where
if ext is not None:
# convert WKB blob to GEOMETRY
sql_query = sql_query.replace('ST_Intersects("geometry"', 'ST_Intersects(ST_GeomFromWKB(geometry)')
# duckdb_spatial apparently cannot yet read wkb_hex representation -> convert it back to text
spatial = ("ST_GeomFromText('POLYGON(({xmin} {ymin}, {xmin} {ymax}, "
"{xmax} {ymax}, {xmax} {ymin}, {xmin} {ymin}))')")
sql_query = re.sub(r'ST_GeomFromWKB\(x\'.*\'\)', spatial.format(**ext), sql_query)
sql_queryExecute the query:
df = duckdb.query(sql_query)I wonder, how would you do this? Do you think there is anything that could/should be modified on the pygeofilter end or is it entirely up to the DuckDB-spatial package? I'd appreciate any help.