-
Notifications
You must be signed in to change notification settings - Fork 215
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support pushdown filters for non-cast date conversion functions (e.g. to_date) #933
Comments
Thanks for creating this issue @omerhadari. Is this something you would like to contribute? |
Yes I would like to take a shot at this. Will be free a bit later this week to try if that's ok? |
@omerhadari That would be great, feel free to ping me for a review of reach out in case of any questions 👍 |
Thanks @omerhadari for raising this. To support this feature, there are some blocking issues since iceberg-rust's expression following java's implementation is quite limited, so it's a little difficult to do it in the core crate. I think an easy way to do this would add a transform to convert There are other long term solutions to support more scalar functions, but it requires another design. |
Thank you @liurenjie1024 for the elaboration! Is this an issue in the Java implementation as well, or does it have a way to express functions? Copying a comment from my PR because maybe it makes more sense to discuss in the issue. Note the point about how Regarding your suggested alternative calculation, this is actually what I did on my part to work around the issue, but didn't want to implement here because I'm new to the project and did not know if this is too workaround-y. Here is my comment from the PR itself: I wanted to ask, is there a way to express function within iceberg predicates? Is this even desired? The reason this could be beneficial is that sometimes you need access to the column value and then you could perform much better manifest elimination. A few examples I have in this context:
This also reveals what I think is a bug. In If I understand correctly, this could cause wrong results for example for the query would result in the predicate Would appreciate some guidance about how to tackle this issue of propagating more information, I don't think it makes sense in the scope of this PR but maybe I am missing something basic. |
Mainly - I think I accidentally stepped into a rabbit hole, and need some help scoping this issue and the PR. Here is a suggestion, please let me know if this makes sense as achievables within the scope:
Out of scope:
|
@omerhadari I think that's a great first step. When you start doing |
Updated the PR according to this set of problems for now. It doesn't solve the entire issue, but I am not sure I feel comfortable with the approach @liurenjie1024 suggested for dealing with dates, despite being logically sound. I think it adds a lot of unexpected complexity and is only a solution for a subset of the issue (only day-resolution comparisons). |
Currently there is no way to expression function within iceberg predicates, which is also a problem in iceberg-java/iceberg-python. I'm not sure about the background why this doesn't exist. cc @Fokko |
Currently, for queries that compare timestamps/dates using
TO_DATE
in order to for example truncate a timestamp column, no pushdown predicates are applied. This is because the functionsTO_DATE
,TO_TIMESTAMP
are not casts, and so they reach the functionto_iceberg_predicate
asScalarFunction
s, and not matched on any branch.It would be nice to identify these cases because it is very common to have a partition key on a timestamp column with a Day Transformation, and currently queries such as
SELECT * FROM table WHERE TO_DATE(table.timestamp_col) = '2025-01-01'
result in no pushdown predicates at all.Something along the line of these tests would ideally pass:
The text was updated successfully, but these errors were encountered: