-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
I think some ClickBench queries are not being correctly executed, as EventDate is being treated as a string.
For example, in query 36:
SELECT "URL", COUNT(*) AS PageViews
FROM hits
WHERE "CounterID" = 62
AND "EventDate" >= '2013-07-01'
AND "EventDate" <= '2013-07-31'
AND "DontCountHits" = 0
AND "IsRefresh" = 0
AND "URL" <> ''
GROUP BY "URL"
ORDER BY PageViews DESC
LIMIT 10;When executing it, EventDate (originally a UInt16) will be casted to Utf8:
DataSourceExec: ...
predicate=CounterID@1 = 62
AND CAST(EventDate@0 AS Utf8) >= 2013-07-01
AND CAST(EventDate@0 AS Utf8) <= 2013-07-31
AND DontCountHits@4 = 0
AND IsRefresh@3 = 0
AND URL@2 != , ...
This in turn makes the query return nothing. If we instead cast the EventDate first to date then it will work as expected:
-- current
SELECT "URL", COUNT(*) AS PageViews
FROM hits
WHERE "CounterID" = 62
AND "EventDate" >= '2013-07-01'
AND "EventDate" <= '2013-07-31'
AND "DontCountHits" = 0
AND "IsRefresh" = 0
AND "URL" <> ''
GROUP BY "URL"
ORDER BY PageViews DESC
LIMIT 10;
+-----+-----------+
| URL | pageviews |
+-----+-----------+
+-----+-----------+
0 row(s) fetched.
Elapsed 0.120 seconds.
-- with cast
SELECT "URL", COUNT(*) AS PageViews
FROM hits
WHERE "CounterID" = 62
AND "EventDate"::int::date >= '2013-07-01'
AND "EventDate"::int::date <= '2013-07-31'
AND "DontCountHits" = 0
AND "IsRefresh" = 0
AND "URL" <> ''
GROUP BY "URL"
ORDER BY PageViews DESC
LIMIT 10;
+--------------------------------------------------------------------------+-----------+
| URL | pageviews |
+--------------------------------------------------------------------------+-----------+
| http://irr.ru/index.php?showalbum/login-leniya7777294,938303130 | 102341 |
| http://komme%2F27.0.1453.116 | 51218 |
| http://irr.ru/index.php?showalbum/login-kapusta-advert2668]=0&order_by=0 | 18315 |
| http://irr.ru/index.php?showalbum/login-kapustic/product_name | 16461 |
| http://irr.ru/index.php | 12577 |
| http://irr.ru/index.php?showalbum/login | 10880 |
| http://komme%2F27.0.1453.116 Safari%2F5.0 (compatible; MSIE 9.0; | 7627 |
| http://irr.ru/index.php?showalbum/login-kupalnik | 4369 |
| http://irr.ru/index.php?showalbum/login-kapusta-advert27256.html_params | 4058 |
| http://komme%2F27.0.1453.116 Safari | 3021 |
+--------------------------------------------------------------------------+-----------+
10 row(s) fetched.
Elapsed 0.189 seconds.I tracked it down to this issue #15509 and this PR #15574, where it has been removed by mistake and the existing tests were not able to caught it. It includes queries 36 to 42. I think the confusion came from the fact that DuckDB also does not do the cast when executing the query, but it casts when creating the view:
CREATE VIEW hits AS
SELECT *
REPLACE (make_date(EventDate) AS EventDate)
FROM read_parquet('hits.parquet', binary_as_string=True);To Reproduce
ClickBench.
Expected behavior
Return data in queries 36-42.
Additional context
No response
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working