Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core] Lazy import modules to reduce importing time #4802

Open
wants to merge 16 commits into
base: master
Choose a base branch
from

Conversation

DanielZhangQD
Copy link
Contributor

@DanielZhangQD DanielZhangQD commented Feb 23, 2025

Part of #4678
This PR only covers the third-party modules to import lazily, however, from the test result below, it still takes a long time to import the whole sky module.
So this needs to be improved further, e.g.:

  • Lazy import of other unnecessary modules
  • Split some core modules further to make the import more fine-grained

Tested (run the relevant ones):

  • Code formatting: bash format.sh
  • Any manual or new tests for this PR (please check the separate comments below)
  • All smoke tests: pytest tests/test_smoke.py
  • Relevant individual smoke tests: pytest tests/test_smoke.py::test_fill_in_the_name
  • Backward compatibility tests: conda deactivate; bash -i tests/backward_compatibility_tests.sh

@DanielZhangQD
Copy link
Contributor Author

Method to verify the import time:
Execute the scripts with command:

./run_clean_import.sh --force-reinstall

Result of this change:

  • MacOS with arm: total time from 5.258s to 2.053s
    • Master
image
  • This PR
image
  • Linux: total time from 1.595s to 0.916s
    • Master
image
  • This PR
image

@DanielZhangQD
Copy link
Contributor Author

Hi @Michaelvll, could you please help trigger smoke_test for this PR? Thanks!

BTW, as I have described in the PR description, this PR only covers the 3rd party modules and the import time is still long and needs to be reduced further.
However, further improvement may need to touch the business logic of the core modules, e.g. split some complicated modules, and I prefer to do that in separate PRs later, what do you think?

@concretevitamin
Copy link
Member

/quicktest-core

@Michaelvll
Copy link
Collaborator

Thanks for submitting this PR @DanielZhangQD! Doing the core part inanother PR sounds good to me.

I have some quick questions:

  1. The main goal of doing this is to speed up some of our core operations. Does making this lazy import help: (a) sky status, (b) sky launch, (c) sky exec, (d) sky queue? Since some of those operations involves ssh and run inline python script that imports sky, reducing the import time may help those case. It would be better to have some profiling for this.
  2. Is there any drawback for making those third-party lazy import?

@DanielZhangQD
Copy link
Contributor Author

Hi @Michaelvll, thanks for the info!
I will:

  1. Profile those core operations
  2. Yes, there may be some potential issues for the lazy import, e.g.:
  • 2.1 Race condition in multithread importing
  • 2.2 Potential broken type hints
  • 2.3 Missing or incompatible dependencies are only detected at runtime when the lazy import is triggered, not during startup
  • 2.4 The first use of a lazily imported module incurs import overhead
  • ...

For 2.1, I see Class LazyImport already includes a lock to avoid this issue.
For the others, we must thoroughly test this PR to avoid functionality breaks and balance the benefits v.s. cost

And I will fix the issue with quicktest-core.

@Michaelvll
Copy link
Collaborator

/quickcore-test

@Michaelvll
Copy link
Collaborator

/quicktest-core

@DanielZhangQD
Copy link
Contributor Author

Hi @Michaelvll, here are the profiles for the sky core operations:
master(with commit 546c086) vs this branch.

  • New clusters without setup/run
    multitime -n 5 sky launch -y --cloud kubernetes --cpus 0.1
    4678 vs master ~ 45% improvement
# 4678
===> multitime results
1: sky launch -y --cloud kubernetes --cpus 0.1
            Mean        Std.Dev.    Min         Median      Max
real        350.990     89.446      234.467     337.578     479.164     
user        2.783       0.618       2.079       2.687       3.613       
sys         0.156       0.025       0.124       0.175       0.179 
# master
===> multitime results
1: sky launch -y --cloud kubernetes --cpus 0.1
            Mean        Std.Dev.    Min         Median      Max
real        647.085     186.507     479.399     599.054     1008.504    
user        11.128      0.882       10.131      10.913      12.728      
sys         0.227       0.032       0.184       0.233       0.278 
  • New clusters with setup/run
    multitime -n 5 sky launch -y --cloud k8s task.yaml
    4678 vs master ~ 54% improvement
# 4678
===> multitime results
1: sky launch -y --cloud k8s task.yaml
            Mean        Std.Dev.    Min         Median      Max
real        363.855     184.920     222.544     265.015     722.008     
user        2.954       0.969       2.149       2.429       4.795       
sys         0.166       0.031       0.133       0.160       0.222  
# master
===> multitime results
1: sky launch -y --cloud k8s task.yaml
            Mean        Std.Dev.    Min         Median      Max
real        793.333     160.695     664.094     696.065     1019.838    
user        12.163      1.145       11.155      11.570      13.764      
sys         0.264       0.012       0.255       0.257       0.281   
  • New clusters with file mounts
    multitime -n 5 sky launch -y --cloud k8s task_file_mount.yaml

4678 vs master ~ 76% improvement

# 4678
===> multitime results
1: sky launch -y --cloud k8s task_file_mount.yaml
            Mean        Std.Dev.    Min         Median      Max
real        192.364     54.091      154.858     171.923     299.066     
user        1.771       0.246       1.625       1.665       2.263       
sys         0.123       0.014       0.108       0.123       0.148  
# master
===> multitime results
1: sky launch -y --cloud k8s task_file_mount.yaml
            Mean        Std.Dev.    Min         Median      Max
real        833.846     171.112     705.118     720.754     1075.666    
user        12.593      1.225       11.677      11.778      14.324      
sys         0.274       0.030       0.251       0.255       0.316 

sky launch -y -c mygpucluster --cloud kubernetes --cpus 0.1

  • Exec
    multitime -n 5 sky exec mygpucluster echo hi -d

4678 vs master ~ 11% improvement

# 4678
1: sky exec mygpucluster echo hi -d
            Mean        Std.Dev.    Min         Median      Max
real        3.713       0.622       2.944       4.146       4.261       
user        0.716       0.022       0.692       0.707       0.743       
sys         0.091       0.008       0.078       0.096       0.099  
# master
===> multitime results
1: sky exec mygpucluster echo hi -d
            Mean        Std.Dev.    Min         Median      Max
real        4.174       0.652       3.638       3.644       5.008       
user        7.607       0.199       7.318       7.565       7.850       
sys         0.125       0.015       0.111       0.120       0.153 
  • Launch on existing cluster
    multitime -n 5 sky launch -y -c mygpucluster echo hi

4678 vs master ~ 13% improvement

# 4678
===> multitime results
1: sky launch -y -c mygpucluster echo hi
            Mean        Std.Dev.    Min         Median      Max
real        13.674      0.307       13.464      13.535      14.281      
user        0.924       0.016       0.895       0.929       0.939       
sys         0.084       0.010       0.067       0.086       0.094   
# master
===> multitime results
1: sky launch -y -c mygpucluster echo hi
            Mean        Std.Dev.    Min         Median      Max
real        15.799      0.860       15.148      15.244      17.381      
user        7.901       0.100       7.723       7.932       8.023       
sys         0.119       0.009       0.103       0.122       0.130
  • status
    multitime -n 5 sky status

4678 vs master ~ 12% improvement

# 4678
===> multitime results
1: sky status
            Mean        Std.Dev.    Min         Median      Max
real        2.333       0.030       2.282       2.347       2.364       
user        0.649       0.018       0.625       0.651       0.676       
sys         0.078       0.009       0.063       0.080       0.088  
# master
===> multitime results
1: sky status
            Mean        Std.Dev.    Min         Median      Max
real        2.654       0.130       2.543       2.636       2.898       
user        7.683       0.102       7.518       7.735       7.777       
sys         0.112       0.016       0.096       0.108       0.141 
  • queue
    multitime -n 5 sky queue

4678 vs master ~ 14% improvement

# 4678
===> multitime results
1: sky queue
            Mean        Std.Dev.    Min         Median      Max
real        2.939       0.043       2.853       2.957       2.966       
user        0.671       0.010       0.654       0.670       0.683       
sys         0.087       0.008       0.076       0.087       0.099 
# master
===> multitime results
1: sky queue
            Mean        Std.Dev.    Min         Median      Max
real        3.442       0.057       3.338       3.448       3.504       
user        7.705       0.116       7.529       7.694       7.840       
sys         0.124       0.019       0.098       0.131       0.151   
  • task.yaml
name: minimal

resources:
  cpus: 0.1

setup: |
  echo "running setup"

run: |
  echo "running commands"
  • task_file_mount.yaml
name: minimal

resources:
  cpus: 0.1

file_mounts:
  sky/examples: ./examples

setup: |
  echo "running setup"

run: |
  echo "running commands"

@DanielZhangQD
Copy link
Contributor Author

However, the new clusters results may only have limited reference value. Most of the time was spent installing dependencies (Preparing SkyPilot runtime (2/3 - dependencies) ), mostly likely due to network issues.

From the other profile results, this PR does not significantly reduce the core operation time cost. This is understandable, as the core operations are time-consuming, and the package import time is not a major contributor to the total time cost.

In addition, on second thought, I think the lengthy import time is primarily due to the original design of SkyPilot as a monolithic client tool, which combined all packages. With the new client-server architecture, we can refactor to separate the packages for the client and server. For the command-line tool, we should only import the client-related packages. On the server side, the import time should not be an issue. This approach will naturally resolve the import time issue for client operations.

So I think this PR is not a good solution for #4678, what do you think @Michaelvll ?

@Michaelvll
Copy link
Collaborator

However, the new clusters results may only have limited reference value. Most of the time was spent installing dependencies (Preparing SkyPilot runtime (2/3 - dependencies) ), mostly likely due to network issues.

From the other profile results, this PR does not significantly reduce the core operation time cost. This is understandable, as the core operations are time-consuming, and the package import time is not a major contributor to the total time cost.

In addition, on second thought, I think the lengthy import time is primarily due to the original design of SkyPilot as a monolithic client tool, which combined all packages. With the new client-server architecture, we can refactor to separate the packages for the client and server. For the command-line tool, we should only import the client-related packages. On the server side, the import time should not be an issue. This approach will naturally resolve the import time issue for client operations.

So I think this PR is not a good solution for #4678, what do you think @Michaelvll ?

The performance improvements from 10%-60% are good!

One question I have is why the launch takes so long for the master in the tests. It should definitely be less than 600 seconds as shown above. Could we try it with 2 CPUs instead?

Yes, reducing the dependency on the client side is definitely a good approach, but there are also some cases reducing the latency for importing sky may help, e.g. in sky queue:

  • We generate the code to be run on the remote cluster:
    code = job_lib.JobLibCodeGen.get_job_queue(user_hash, all_jobs)
  • The generated code contains the import of sky:
    def run_timestamp_with_globbing_payload(job_ids: List[Optional[str]]) -> str:
    """Returns the relative paths to the log files for job with globbing."""
    query_str = ' OR '.join(['job_id GLOB (?)'] * len(job_ids))
    _CURSOR.execute(
    f"""\
    SELECT * FROM jobs
    WHERE {query_str}""", job_ids)
    rows = _CURSOR.fetchall()
    run_timestamps = {}
    for row in rows:
    job_id = row[JobInfoLoc.JOB_ID.value]
    run_timestamp = row[JobInfoLoc.RUN_TIMESTAMP.value]
    run_timestamps[str(job_id)] = run_timestamp
    return message_utils.encode_payload(run_timestamps)
    class JobLibCodeGen:
    """Code generator for job utility functions.
    Usage:
    >> codegen = JobLibCodeGen.add_job(...)
    """
    _PREFIX = [
    'import os',
    'import getpass',
    'from sky.skylet import job_lib, log_lib, constants',
    ]
    @classmethod
    def add_job(cls, job_name: Optional[str], username: str, run_timestamp: str,
    resources_str: str) -> str:
    if job_name is None:
    job_name = '-'
    code = [
    # We disallow job submission when SKYLET_VERSION is older than 9, as
    # it was using ray job submit before #4318, and switched to raw
    # process. Using the old skylet version will cause the job status
    # to be stuck in PENDING state or transition to FAILED_DRIVER state.
    '\nif int(constants.SKYLET_VERSION) < 9: '
    'raise RuntimeError("SkyPilot runtime is too old, which does not '
    'support submitting jobs.")',
    '\njob_id = job_lib.add_job('
    f'{job_name!r},'
    f'{username!r},'
    f'{run_timestamp!r},'
    f'{resources_str!r})',
    'print("Job ID: " + str(job_id), flush=True)',
    ]
    return cls._build(code)
    @classmethod
    def queue_job(cls, job_id: int, cmd: str) -> str:
    code = [
    'job_lib.scheduler.queue('
    f'{job_id!r},'
    f'{cmd!r})',
    ]
    return cls._build(code)
    @classmethod
    def update_status(cls) -> str:
    code = ['job_lib.update_status()']
    return cls._build(code)
    @classmethod
    def get_job_queue(cls, user_hash: Optional[str], all_jobs: bool) -> str:
    # TODO(SKY-1214): combine get_job_queue with get_job_statuses.
    code = [
    'job_queue = job_lib.dump_job_queue('
    f'{user_hash!r}, {all_jobs})',
    'print(job_queue, flush=True)',
    ]
    return cls._build(code)

In this case, if we can reduce the time for importing sky or even just skylet, it might help reducing the time for fetching the queue. May worth double check for that.

@DanielZhangQD
Copy link
Contributor Author

OK, I will try the new clusters profiling with 2 cpus and double-check the improvement for the other operations, thanks!

@DanielZhangQD DanielZhangQD changed the title lazy import modules to reduce importing time [core] Lazy import modules to reduce importing time Feb 26, 2025
@DanielZhangQD
Copy link
Contributor Author

Hi @Michaelvll, here are the profiles for the sky core operations with the latest code change:
master(with commit 6eed3a0) vs this branch.

For the new clusters cases, I tried with 2 cpus and the time cost varies between 80 - 500s, while after running many rounds, I found that the network should be the biggest contributor (mainly during the dependency installation), so the results have limited value to refer to and I won't paste them here.

Here are the results for the other cases with the latest change:

sky launch -y -c mygpucluster --cloud kubernetes --cpus 2

  • Exec
    multitime -n 5 sky exec mygpucluster echo hi -d

4678 vs master ~ 20% improvement (in some cases, the value would be about 10%)

# 4678
===> multitime results
1: sky exec mygpucluster echo hi -d
            Mean        Std.Dev.    Min         Median      Max
real        2.747       0.057       2.642       2.766       2.811
user        0.660       0.014       0.641       0.654       0.678
sys         0.078       0.007       0.068       0.076       0.090

# master
===> multitime results
1: sky exec mygpucluster echo hi -d
            Mean        Std.Dev.    Min         Median      Max
real        3.450       0.015       3.427       3.449       3.467
user        7.866       0.042       7.794       7.887       7.906
sys         0.115       0.006       0.108       0.114       0.124
  • Launch on existing cluster
    multitime -n 5 sky launch -y -c mygpucluster echo hi

4678 vs master ~ 12% improvement

# 4678
 ===> multitime results
1: sky launch -y -c mygpucluster echo hi
            Mean        Std.Dev.    Min         Median      Max
real        13.080      0.084       12.967      13.071      13.228
user        0.667       0.003       0.664       0.667       0.671
sys         0.081       0.006       0.072       0.080       0.090
# master
===> multitime results
1: sky launch -y -c mygpucluster echo hi
            Mean        Std.Dev.    Min         Median      Max
real        14.901      0.202       14.696      14.784      15.212
user        7.572       0.496       6.582       7.830       7.852
sys         0.119       0.016       0.100       0.122       0.143
  • status
    multitime -n 5 sky status

4678 vs master ~ 11 % improvement

# 4678
===> multitime results
1: sky status
            Mean        Std.Dev.    Min         Median      Max
real        2.157       0.020       2.135       2.159       2.190       
user        0.644       0.015       0.624       0.636       0.667       
sys         0.080       0.008       0.065       0.084       0.086 
# master
===> multitime results
1: sky status
            Mean        Std.Dev.    Min         Median      Max
real        2.426       0.046       2.336       2.436       2.464
user        7.755       0.058       7.649       7.765       7.826
sys         0.096       0.005       0.089       0.100       0.102
  • queue
    multitime -n 5 sky queue

4678 vs master ~ 13% improvement

# 4678
===> multitime results
1: sky queue
            Mean        Std.Dev.    Min         Median      Max
real        2.734       0.075       2.651       2.737       2.851
user        0.626       0.012       0.612       0.621       0.644
sys         0.084       0.005       0.077       0.083       0.092
# master
  ===> multitime results
1: sky queue
            Mean        Std.Dev.    Min         Median      Max
real        3.155       0.032       3.107       3.154       3.207
user        7.748       0.043       7.676       7.743       7.803
sys         0.103       0.015       0.089       0.094       0.128
  • task.yaml
name: minimal

resources:
  cpus: 2

setup: |
  echo "running setup"

run: |
  echo "running commands"
  • task_file_mount.yaml
name: minimal

resources:
  cpus: 2

file_mounts:
  sky/examples: ./examples

setup: |
  echo "running setup"

run: |
  echo "running commands"

@DanielZhangQD
Copy link
Contributor Author

I submitted a new commit to lazy import the requests in backend_utils and here is the import profile with the latest change:

  • master
image
  • This PR
image

@DanielZhangQD
Copy link
Contributor Author

Hi @Michaelvll, could you please help trigger smoke_test for this PR? Thanks!

@DanielZhangQD DanielZhangQD marked this pull request as ready for review February 27, 2025 00:12
Copy link
Collaborator

@Michaelvll Michaelvll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @DanielZhangQD! The PR looks mostly good to me. Please find the comments below.

max_backoff_factor=MAX_BACKOFF_FACTOR)
for i in range(MAX_ATTEMPTS):
if method == 'get':
response = self.session.get(url, headers=headers)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add more comments to elaborate why we don't use requests. here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! I originally thought that using session here would not trigger importing requests early, however, after further testing, using requests here does not trigger importing early, so I will revert the change here and only keep the lazy import change.

import requests
if typing.TYPE_CHECKING:
import ssl

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The line is added by the format.sh.

return _get_system_memory_gb() // constants.CONTROLLER_MEMORY_USAGE_GB


# Only calculate these when actually needed
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Only calculate these when actually needed
# Only calculate these when actually needed to avoid importing psutil when not necessary during import sky``

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

Comment on lines 150 to 151
if len(serve_state.get_services()) >= serve_utils.get_num_service_threshold(
):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if len(serve_state.get_services()) >= serve_utils.get_num_service_threshold(
):
if (len(serve_state.get_services()) >= serve_utils.get_num_service_threshold()):

Add () to avoid formating to split lines in a weird way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

Comment on lines 64 to 69
def get_num_service_threshold():
"""Get number of services threshold, calculating it only when needed."""
global _num_service_threshold
if _num_service_threshold is None:
_num_service_threshold = _get_num_service_threshold()
return _num_service_threshold
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should avoid shallow function like _get_num_service_threshold and _get_system_memory_gb.

Suggested change
def get_num_service_threshold():
"""Get number of services threshold, calculating it only when needed."""
global _num_service_threshold
if _num_service_threshold is None:
_num_service_threshold = _get_num_service_threshold()
return _num_service_threshold
def get_num_service_threshold():
"""Get number of services threshold, calculating it only when needed."""
global _num_service_threshold
if _num_service_threshold is None:
system_memory_gb = psutil.virtual_memory().total // (1024**3)
return system_memory_gb // constants.CONTROLLER_MEMORY_USAGE_GB
return _num_service_threshold

Also, would it be easier to simpler to use lru_cache instead of handling the global env var?

Suggested change
def get_num_service_threshold():
"""Get number of services threshold, calculating it only when needed."""
global _num_service_threshold
if _num_service_threshold is None:
_num_service_threshold = _get_num_service_threshold()
return _num_service_threshold
@annotations.lru_cache(scope='request')
def get_num_service_threshold():
"""Get number of services threshold, calculating it only when needed."""
system_memory_gb = psutil.virtual_memory().total // (1024**3)
return system_memory_gb // constants.CONTROLLER_MEMORY_USAGE_GB

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, updated

_console = None # Lazy initialized console


# Move global console to a function
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Move global console to a function
# Move global console to a function to avoid importing rich console if not used

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

Comment on lines 20 to 30
_console = None # Lazy initialized console


# Move global console to a function
def get_console():
"""Get or create the rich console."""
global _console
if _console is None:
_console = rich_console.Console(soft_wrap=True)
return _console
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, should we reuse the one in ux_utils

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The console here is a different instance from that in the ux_utils.py and they are initialized with different parameters rich_console.Console(soft_wrap=True) for this one and rich_console.Console() for the one in ux_utils.py. Should we just use one of them? If yes, should the soft_wrap be set to true?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think setting soft_wrap to always be True should be fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, thanks!

@DanielZhangQD
Copy link
Contributor Author

Hi @Michaelvll, comments addressed except for this one #4802 (comment), if we determine to use one console instance, I think we can move the console management to a separate module, e.g. console_utils.py, what do you think?
PTAL again, thanks!

@Michaelvll
Copy link
Collaborator

/smoke-test

@Michaelvll
Copy link
Collaborator

/smoke-test

Copy link
Collaborator

@Michaelvll Michaelvll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @DanielZhangQD! It looks mostly good to me.

Can we add a unit test to make sure that we don't add those imports back again in the future edits, i.e. we should check import sky does not import those package immediately that we have changed to LazyImports

filelock = adaptors_common.LazyImport('filelock')
yaml = adaptors_common.LazyImport('yaml')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be good to mention the time scale of the imports here in the comments, so that future maintainance can use them as a reference

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean to comment the time of non-lazy importing filelock&yaml here or the time of importing authentication after this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added comments for the third-party modules in sky/adaptors/common.py and descriptions in the contributing guide.

@DanielZhangQD
Copy link
Contributor Author

Thanks @DanielZhangQD! It looks mostly good to me.

Can we add a unit test to make sure that we don't add those imports back again in the future edits, i.e. we should check import sky does not import those package immediately that we have changed to LazyImports

OK, I will add UT for this.

* Examples
* Documentation
* Tutorials, blog posts and talks on SkyPilot
- [Bug reports](https://github.com/skypilot-org/skypilot/issues) and [discussions](https://github.com/skypilot-org/skypilot/discussions)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These kinds of changes are done by auto format tool.

@DanielZhangQD
Copy link
Contributor Author

Hi @Michaelvll, the following items have been done:

  • Add UT case to avoid lazy import being reverted
  • Update comments in sky/adaptors/common.py
  • Add descriptions in the contribution guide

PTAL again.

BTW, could you please help trigger the smoke test again? Thanks!

@cg505 cg505 requested a review from Michaelvll March 5, 2025 19:56
Copy link
Collaborator

@aylei aylei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work @DanielZhangQD ! Just some minors

@@ -219,7 +222,7 @@ def client_status(msg: str) -> Union['rich_console.Status', _NoOpConsoleStatus]:
if (threading.current_thread() is threading.main_thread() and
not sky_logging.is_silent()):
if _statuses['client'] is None:
_statuses['client'] = console.status(msg)
_statuses['client'] = console_utils.get_console().status(msg)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we simply put get_console in this file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both rich_utils and ux_utils use the get_console and it would cause cycle import here.

assert 'jsonschema' not in sys.modules
assert 'pendulum' not in sys.modules

# Check that the lazy imported modules are imported after used
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure whether the following assertions are behaviors we want to keep. To me, what modules are used by the function seems like an easy to change implementation detail. Maybe existing unit tests of these functions themselves would be sufficient to ensure they still work after this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense, I will remove the check below.

# Clean modules that are lazy imported by sky
modules_to_clean = [
module for module in list(sys.modules.keys()) if any(
module.startswith(prefix) for prefix in [
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somewhat hard to maintain. IIUC, our intention is that a module being LazyImported should be LazyImported everywhere (otherwise lazy import just does not work). Can we just list all instances of LazyImport class and assert these modules are not actually imported? e.g.

objs = [obj for obj in gc.get_objects() if isinstance(obj, adaptor_common.LazyImport)]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Michaelvll What do you think? Is the above way good enough?

@DanielZhangQD
Copy link
Contributor Author

Hi @Michaelvll @aylei, Comments addressed, PTAL again.
And could you please help trigger the smoke test again? Thanks!

@aylei
Copy link
Collaborator

aylei commented Mar 7, 2025

/smoke-test --aws


# Check that the lazy imported modules are in the list
for module in lazy_import_modules:
assert module in lazy_module_names, f"Module {module} is not lazy imported"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to assert the module is not in sys.modules I think, also, only one of lazy_module_names or lazy_import_modules should be kept, with different intent:

  • Keeping lazy_import_modules: we want the listed modules to be lazy imported, whatever how it is implemented
  • Keeping lazy_module_names: we want every module that has been wrapped in adaptor_common.LazyImport() to be finally lazy imported

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I got your point. I'll go ahead and revise the case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aylei Case updated, PTAL again, thanks!

@DanielZhangQD
Copy link
Contributor Author

Hi @Michaelvll @aylei, the UT case has been updated to address the above comments, PTAL again.
For the smoke test failures, I have checked several of them with help from @aylei, and it seems that the failure is not related to this PR, please help double confirm.
Thanks for your help!

]


@pytest.fixture(autouse=True)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, autouse fixture will be called for every test function. Looks like we only need this for test_sky_import?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will not be called by cases in other test files, and I will remove the autouse in case we add extra cases in the same file.

Copy link
Collaborator

@aylei aylei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, thanks!

@aylei
Copy link
Collaborator

aylei commented Mar 7, 2025

/smoke-test

@DanielZhangQD
Copy link
Contributor Author

Hi @Michaelvll, Conflicts resolved, PTAL again, thanks!

@Michaelvll
Copy link
Collaborator

/smoke-test --aws

@Michaelvll
Copy link
Collaborator

/smoke-test --aws

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

import sky almost imports all subpackages and adds 1s overhead to all operations
4 participants