Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Nebius] Nebius Object Storage support. #4838

Open
wants to merge 21 commits into
base: master
Choose a base branch
from

Conversation

SalikovAlex
Copy link
Contributor

@SalikovAlex SalikovAlex commented Feb 27, 2025

Tested (run the relevant ones):

  • Code formatting: bash format.sh
  • Any manual or new tests for this PR (please specify below)
    pytest tests/smoke_tests/test_mount_and_storage.py::test_nebius_storage_mounts --nebius
  • All smoke tests: pytest tests/test_smoke.py
  • Relevant individual smoke tests:
    pytest tests/smoke_tests/test_mount_and_storage.py::test_nebius_storage_mounts --nebius
  • pytest tests/smoke_tests/test_mount_and_storage.py::TestStorageWithCredentials::test_externally_created_bucket_mount_without_source --nebius

This commit introduces support for Nebius object storage, enabling users to integrate Nebius storage with various functionalities such as file mounting, syncing, and cloud transfers. It includes necessary adaptors, utility methods, and updates to the storage framework to handle Nebius-specific configurations and endpoints.
@SalikovAlex SalikovAlex marked this pull request as draft February 27, 2025 14:19
Implemented new Nebius storage mounting functionality, including COPY and MOUNT modes, with tests. Updated cloud_stores.py for AWS CLI compatibility and added a YAML template for Nebius storage configurations. Removed outdated Nebius test case in favor of the new approach.
This commit introduces comprehensive Nebius support, making it accessible for S3-compatible operations including bucket creation, deletion, and mounting. It removes reliance on the AWS_SHARED_CREDENTIALS_FILE environment variable, streamlines Nebius-specific configurations, and adds associated unit test parameters to validate functionality across storage operations.
@SalikovAlex SalikovAlex marked this pull request as ready for review February 28, 2025 19:47
@Michaelvll Michaelvll requested a review from cblmemo February 28, 2025 21:58
Copy link
Collaborator

@cblmemo cblmemo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this @SalikovAlex ! This is exciting. Left some discussions ;)

Remove redundant code, streamline imports, and enhance error messaging. Adjust documentation for better accuracy and update function annotations. These changes improve maintainability and readability of the Nebius adaptor module.
Simplified Nebius AWS CLI installation by reusing the S3CloudStorage configuration for consistency. Removed unnecessary debug print in `run_upload_cli` to reduce console noise. Minor formatting adjustment in test YAML file.
Clean up and improve code readability, including string formatting and conditionals. Introduce `_TIMEOUT_TO_PROPAGATES` to handle timeout while verifying Nebius bucket deletions. Update comments to reflect the corrected usage on Nebius servers.
Removed unused variable from subprocess call to clean up code. Updated timeout error to include the bucket name for more detailed and helpful error reporting.
Updated the helper method to assign a default region when no region is specified, ensuring compatibility with Nebius Object Storage. This change avoids potential errors caused by missing region values.
Replace 'nebius://' with 's3://' in source paths to ensure compatibility with AWS CLI commands. This allows seamless integration of Nebius storage endpoints.
Copy link
Collaborator

@cblmemo cblmemo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update @SalikovAlex ! The PR mostly looks good to me. Left some discussions - after that it should be ready to go!

Could you also help run the related smoke test on nebius?

# To increase parallelism, modify max_concurrent_requests in your
# aws config file (Default path: ~/.aws/config).
endpoint_url = nebius.create_endpoint()
if 'nebius://' in source:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it starts with s3://, isn't it be treated as S3 storage and route to the S3Store class?

Updated the `create_endpoint` function to ensure the `region` parameter is strictly typed as `str`. Modified `create_nebius_client` to accept `None` as the default region. Additionally, corrected the error message in `storage.py` to specify 'nebius' instead of 's3'.
Updated R2 command to explicitly set AWS_SHARED_CREDENTIALS_FILE for better credential management. Simplified region assignment logic in storage initialization to improve code readability and maintainability.
@SalikovAlex
Copy link
Contributor Author

Looks like you are right.
I am trying to guess while in R2 we are using

if 'r2://' in source:
    source = source.replace('r2://', 's3://')

Copy link
Collaborator

@cblmemo cblmemo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @SalikovAlex for fixing this! Left final comments. After addressing those, could you help running the related smoke test on your side?

# To increase parallelism, modify max_concurrent_requests in your
# aws config file (Default path: ~/.aws/config).
endpoint_url = nebius.create_endpoint()
if 'nebius://' in source:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image I think here we also assume it can only start with `nebius://`

Ensure Nebius paths are properly validated and transformed, replacing `if` checks with assertions. Fixed default region handling in `create_endpoint` and corrected variable naming in `split_nebius_path` for consistency. These changes enhance code reliability and maintainability.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants