Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expr simplifier doesn't simplify exprs that are same if you swap lhs with rhs regardless of cycles #14943

Open
ion-elgreco opened this issue Feb 28, 2025 · 2 comments
Labels
bug Something isn't working help wanted Extra attention is needed performance Make DataFusion faster

Comments

@ion-elgreco
Copy link

ion-elgreco commented Feb 28, 2025

Describe the bug

I was doing some improvements to merge in delta-rs to add an expr simplifier on our early pruning filter. While doing this I noticed that these type of expressions: s.foo = 'a' and 'a' = s.foo are not getting simplified into s.foo='a' directly but rather in s.foo='a' and s.foo='a' even when you set max cycles to ludicrous number of 100.

If the lhs and rhs can be swapped and then it's equal to an existing expression, it should be removed in one cicle.

See one of our logs after using the simplifier:

Before simplifying

[crates/core/src/operations/merge/mod.rs:864:5] commit_predicate.clone() = Some(
    "unique_row_hash BETWEEN 'new_hash' AND 'new_hash' AND 202502 = month_id AND month_id = 202502 AND 20250226 = date_id AND date_id = 20250226",
)

After simplifying once with max cycles 100

[crates/core/src/operations/merge/mod.rs:886:5] commit_predicate.clone() = Some(
    "unique_row_hash >= 'new_hash' AND unique_row_hash <= 'new_hash' AND month_id = 202502 AND month_id = 202502 AND date_id = 20250226 AND date_id = 20250226",
)

To Reproduce

This is the starting expression.

Some(
    BinaryExpr(
        BinaryExpr {
            left: BinaryExpr(
                BinaryExpr {
                    left: BinaryExpr(
                        BinaryExpr {
                            left: Column(
                                Column {
                                    relation: Some(
                                        Bare {
                                            table: "t",
                                        },
                                    ),
                                    name: "unique_row_hash",
                                },
                            ),
                            op: GtEq,
                            right: Literal(
                                Utf8("new_hash"),
                            ),
                        },
                    ),
                    op: And,
                    right: BinaryExpr(
                        BinaryExpr {
                            left: Column(
                                Column {
                                    relation: Some(
                                        Bare {
                                            table: "t",
                                        },
                                    ),
                                    name: "unique_row_hash",
                                },
                            ),
                            op: LtEq,
                            right: Literal(
                                Utf8("new_hash"),
                            ),
                        },
                    ),
                },
            ),
            op: And,
            right: BinaryExpr(
                BinaryExpr {
                    left: BinaryExpr(
                        BinaryExpr {
                            left: BinaryExpr(
                                BinaryExpr {
                                    left: BinaryExpr(
                                        BinaryExpr {
                                            left: Column(
                                                Column {
                                                    relation: Some(
                                                        Bare {
                                                            table: "t",
                                                        },
                                                    ),
                                                    name: "month_id",
                                                },
                                            ),
                                            op: Eq,
                                            right: Literal(
                                                Int32(202502),
                                            ),
                                        },
                                    ),
                                    op: And,
                                    right: BinaryExpr(
                                        BinaryExpr {
                                            left: Column(
                                                Column {
                                                    relation: Some(
                                                        Bare {
                                                            table: "t",
                                                        },
                                                    ),
                                                    name: "month_id",
                                                },
                                            ),
                                            op: Eq,
                                            right: Literal(
                                                Int64(202502),
                                            ),
                                        },
                                    ),
                                },
                            ),
                            op: And,
                            right: BinaryExpr(
                                BinaryExpr {
                                    left: Column(
                                        Column {
                                            relation: Some(
                                                Bare {
                                                    table: "t",
                                                },
                                            ),
                                            name: "date_id",
                                        },
                                    ),
                                    op: Eq,
                                    right: Literal(
                                        Int32(20250226),
                                    ),
                                },
                            ),
                        },
                    ),
                    op: And,
                    right: BinaryExpr(
                        BinaryExpr {
                            left: Column(
                                Column {
                                    relation: Some(
                                        Bare {
                                            table: "t",
                                        },
                                    ),
                                    name: "date_id",
                                },
                            ),
                            op: Eq,
                            right: Literal(
                                Int64(20250226),
                            ),
                        },
                    ),
                },
            ),
        },
    ),
)

Expected behavior

Simplify further

Additional context

No response

@ion-elgreco ion-elgreco added the bug Something isn't working label Feb 28, 2025
@alamb
Copy link
Contributor

alamb commented Feb 28, 2025

I think this would be a nice improvement

There are code and existing tests that can simplify this type of expression. See tests here

Maybe we need to improve the logic somehow

This would be a good thing for someone to look into

@alamb alamb added help wanted Extra attention is needed performance Make DataFusion faster labels Feb 28, 2025
@ion-elgreco
Copy link
Author

@alamb something to note, even after manually applying the optimizer again it doesn't simply foo = 5 and foo = 5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed performance Make DataFusion faster
Projects
None yet
Development

No branches or pull requests

2 participants