Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add more variants to ErrorKind #1038

Open
1 of 3 tasks
connortsui20 opened this issue Mar 3, 2025 · 7 comments
Open
1 of 3 tasks

Add more variants to ErrorKind #1038

connortsui20 opened this issue Mar 3, 2025 · 7 comments
Labels
enhancement New feature or request

Comments

@connortsui20
Copy link
Contributor

Is your feature request related to a problem or challenge?

As of now, ErrorKind looks like this (doc comments removed):

pub enum ErrorKind {
    Unexpected,
    DataInvalid,
    FeatureUnsupported,
}

This is pretty bare, especially since there are whole classes of things that can go wrong with respect to the Iceberg Catalog trait. See #965 and this comment for some context for this specific repo, as well as the official REST API specification.

It would be nice if there was a dedicated error type for Catalog errors. Some simple things that could be added are variants for tables or namespaces that don't exist, or duplicate tables or namespaces. There are many other things that can go wrong. Right now #965 just uses string messages to represent those errors, which is not ideal.

Adding dedicated error types would also mean the different implementations of the Iceberg Catalog could be more easily unified: the exact return behavior of each method would be the same among every implementation (instead of having a different error message depending on which implementation you choose).

Describe the solution you'd like

I think there are many ways to do this, some better than others. I would say that the current method manually implementing error messages, backtrace, and variants is somewhat shaky. Personally, I think that adding thiserror or snafu as a dependency for this would ease development going forward, not just the error types specified above (for the catalog) but for any other error kinds that Iceberg needs / may introduce.

Willingness to contribute

  • I can contribute to this feature independently
  • I would be willing to contribute to this feature with guidance from the Iceberg Rust community
  • I cannot contribute to this feature at this time
@liurenjie1024
Copy link
Contributor

Thanks @connortsui20 for raising this. I think the motivation behind is that when we add an ErrorKind, we need to think if we could handle it, otherwise we just need to put extra information in context. cc @Xuanwo who is the expert on this.

@Xuanwo
Copy link
Member

Xuanwo commented Mar 12, 2025

Adding a separate error type for the catalog doesn't make it easier for users; instead, it adds more work on their end.

At this stage, I think it would be good to have ErrorKind::TableNotFound and ErrorKind::NamespaceExists. Users can easily handle these errors and obtain detailed information from the error message or context.

@connortsui20
Copy link
Contributor Author

Adding a separate error type for the catalog doesn't make it easier for users; instead, it adds more work on their end.

At this stage, I think it would be good to have ErrorKind::TableNotFound and ErrorKind::NamespaceExists. Users can easily handle these errors and obtain detailed information from the error message or context.

Sorry I should have been clear, I meant that there could be a dedicated Catalog error kind that could be a variant of ErrorKind. For example:

pub enum ErrorKind {
    Unexpected,
    Catalog(CatalogErrorKind),
    DataInvalid,
    FeatureUnsupported,
}

pub enum CatalogErrorKind {
    NoSuchNamespace,
    NoSuchTable,
    NamespaceAlreadyExists,
    TableAlreadyExists,
}

There's also potentially a NamespaceNotEmpty variant under CatalogErrorKind that should be there, see apache/iceberg#12502

And once table update is implemented, there should be CommitFailed and CommitStateUnknown variants.

@jonathanc-n
Copy link
Contributor

I agree with @connortsui20 's idea. The current possible enums for ErrorKind is too bare, I think a NotFound error would be pretty useful (works well as a generalized error type and additional context can be given)

@liurenjie1024
Copy link
Contributor

I think this is an interesting problem to discuss. iceberg-rust is now a multi crate project, so maybe it's worth rethinking the design of error handling. But I think we should not only think about catalog error here, we need an extensible approach for it.

@connortsui20
Copy link
Contributor Author

I'll circle back to one of the ideas I brought up: using thiserror or snafu to help build better error handling. Using either would provide strictly more functionality and would not lose any of the expression (that I can see).

And as a side note, I think that the current methodology of constructing an empty Error and adding static context is not very robust. Doing a quick search through this entire repository, I don't really see such a large number of unique situations that would justify needing to dynamically specify these context fields. It makes a lot more sense to define errors over enum variants, where each variant can decide what fields it needs.

@Xuanwo
Copy link
Member

Xuanwo commented Mar 13, 2025

Before continuing the discussion, I would like to first explain our current error design goals. In general, error design needs to serve the following objectives:

  • The implementer does not need to do extra work.
  • The caller can know what error has occurred.
  • The caller can decide how to handle this error.
  • The caller can understand why this error occurred.

Iceberg's error provides a clear ErrorKind, a concise summary message, and detailed context. Callers can know what happened, decide how to handle it and understand why this error occurred. It also reduce our efforts to decide while error to return while using.

This design shifts our focus from ourselves to our end users. We generally do not return errors like serde_json failure or io.EOF. Instead, we consider whether the error is meaningful to users and contributes to their error handling.

I still think it would be better if we had ErrorKind::TableNotFound, categorizing them as noun-adj. Regardless of which APIs users are calling, they would know what happened.

I propose to add:

pub enum ErrorKind {
    Unexpected,
    DataInvalid,
    FeatureUnsupported,
    
+    NamespaceNotFound,
+    NamespaceExists,
+    TableNotFound,
+    TableExists,
}

Sorry I should have been clear, I meant that there could be a dedicated Catalog error kind that could be a variant of ErrorKind. For example:

Hi, a dedicated catalog error kind doesn't look good to me.

Let's try write code from our users side:

let table = match ctl.load_table("test").await {
  Ok(table) => table,
  Err(err) if err.kind() == ErrorKind::TableNotExists => {
    // handle table not exists, like gerenate a user-friendly error message
  }
  Err(err) => {
    // unexpected things happend, we need to log or return bacl.
  }
} 

If we have different layers of error kind:

let table = match ctl.load_table("test").await {
  Ok(table) => table,
  Err(err) if err.kind() == ErrorKind::Catalog(CatalogErrorKind::NoSuchTable) => {
    // handle table not exists, like gerenate a user-friendly error message
  }
  Err(err) => {
    // unexpected things happend, we need to log or return bacl.
  }
} 

Users need to be aware that it's a catalog error first. However, it's not something they need to take action on.

Add more errors variants doesn't serve our error handling goal too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants