Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove redundant statistics from FileScanConfig #14937

Open
alamb opened this issue Feb 28, 2025 · 1 comment · May be fixed by #14955
Open

Remove redundant statistics from FileScanConfig #14937

alamb opened this issue Feb 28, 2025 · 1 comment · May be fixed by #14955
Assignees
Labels
enhancement New feature or request

Comments

@alamb
Copy link
Contributor

alamb commented Feb 28, 2025

Is your feature request related to a problem or challenge?

FileScanConfig has statistics (FileScanConfig::statistics) but so does file_source

/// Estimated overall statistics of the files, taking `filters` into account.
/// Defaults to [`Statistics::new_unknown`].
pub statistics: Statistics,

And

/// Return projected statistics
fn statistics(&self) -> datafusion_common::Result<Statistics>;

The fact there are two sets of statistics means

  1. there is a potential for bugs when they get out of sync such as was caused in bug: Physical plan round trip fails in some cases after datasource refactor #14679
  2. Planning takes that much longer

Describe the solution you'd like

It would be nice to remove the duplication so it is clear there is only a single statistics (held on the DataSource)

Describe alternatives you've considered

No response

Additional context

No response

@alamb alamb added the enhancement New feature or request label Feb 28, 2025
@Standing-Man
Copy link
Contributor

take

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants