-
Notifications
You must be signed in to change notification settings - Fork 589
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[core] Lazy import modules to reduce importing time #4802
base: master
Are you sure you want to change the base?
Conversation
Method to verify the import time:
Result of this change:
![]()
![]()
![]()
![]() |
Hi @Michaelvll, could you please help trigger smoke_test for this PR? Thanks! BTW, as I have described in the PR description, this PR only covers the 3rd party modules and the import time is still long and needs to be reduced further. |
/quicktest-core |
Thanks for submitting this PR @DanielZhangQD! Doing the core part inanother PR sounds good to me. I have some quick questions:
|
Hi @Michaelvll, thanks for the info!
For 2.1, I see And I will fix the issue with quicktest-core. |
/quickcore-test |
/quicktest-core |
Hi @Michaelvll, here are the profiles for the sky core operations:
4678 vs master ~ 76% improvement
sky launch -y -c mygpucluster --cloud kubernetes --cpus 0.1
4678 vs master ~ 11% improvement
4678 vs master ~ 13% improvement
4678 vs master ~ 12% improvement
4678 vs master ~ 14% improvement
|
However, the From the other profile results, this PR does not significantly reduce the core operation time cost. This is understandable, as the core operations are time-consuming, and the package import time is not a major contributor to the total time cost. In addition, on second thought, I think the lengthy import time is primarily due to the original design of SkyPilot as a monolithic client tool, which combined all packages. With the new client-server architecture, we can refactor to separate the packages for the client and server. For the command-line tool, we should only import the client-related packages. On the server side, the import time should not be an issue. This approach will naturally resolve the import time issue for client operations. So I think this PR is not a good solution for #4678, what do you think @Michaelvll ? |
The performance improvements from 10%-60% are good! One question I have is why the launch takes so long for the master in the tests. It should definitely be less than 600 seconds as shown above. Could we try it with 2 CPUs instead? Yes, reducing the dependency on the client side is definitely a good approach, but there are also some cases reducing the latency for importing sky may help, e.g. in
In this case, if we can reduce the time for importing |
OK, I will try the |
Hi @Michaelvll, here are the profiles for the sky core operations with the latest code change: For the Here are the results for the other cases with the latest change: sky launch -y -c mygpucluster --cloud kubernetes --cpus 2
4678 vs master ~ 20% improvement (in some cases, the value would be about 10%)
4678 vs master ~ 12% improvement
4678 vs master ~ 11 % improvement
4678 vs master ~ 13% improvement
|
Hi @Michaelvll, could you please help trigger |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @DanielZhangQD! The PR looks mostly good to me. Please find the comments below.
max_backoff_factor=MAX_BACKOFF_FACTOR) | ||
for i in range(MAX_ATTEMPTS): | ||
if method == 'get': | ||
response = self.session.get(url, headers=headers) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add more comments to elaborate why we don't use requests.
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! I originally thought that using session
here would not trigger importing requests
early, however, after further testing, using requests
here does not trigger importing early, so I will revert the change here and only keep the lazy import change.
import requests | ||
if typing.TYPE_CHECKING: | ||
import ssl | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The line is added by the format.sh
.
sky/serve/serve_utils.py
Outdated
return _get_system_memory_gb() // constants.CONTROLLER_MEMORY_USAGE_GB | ||
|
||
|
||
# Only calculate these when actually needed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Only calculate these when actually needed | |
# Only calculate these when actually needed to avoid importing psutil when not necessary during import sky`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
sky/serve/service.py
Outdated
if len(serve_state.get_services()) >= serve_utils.get_num_service_threshold( | ||
): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if len(serve_state.get_services()) >= serve_utils.get_num_service_threshold( | |
): | |
if (len(serve_state.get_services()) >= serve_utils.get_num_service_threshold()): |
Add ()
to avoid formating to split lines in a weird way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
sky/serve/serve_utils.py
Outdated
def get_num_service_threshold(): | ||
"""Get number of services threshold, calculating it only when needed.""" | ||
global _num_service_threshold | ||
if _num_service_threshold is None: | ||
_num_service_threshold = _get_num_service_threshold() | ||
return _num_service_threshold |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should avoid shallow function like _get_num_service_threshold
and _get_system_memory_gb
.
def get_num_service_threshold(): | |
"""Get number of services threshold, calculating it only when needed.""" | |
global _num_service_threshold | |
if _num_service_threshold is None: | |
_num_service_threshold = _get_num_service_threshold() | |
return _num_service_threshold | |
def get_num_service_threshold(): | |
"""Get number of services threshold, calculating it only when needed.""" | |
global _num_service_threshold | |
if _num_service_threshold is None: | |
system_memory_gb = psutil.virtual_memory().total // (1024**3) | |
return system_memory_gb // constants.CONTROLLER_MEMORY_USAGE_GB | |
return _num_service_threshold |
Also, would it be easier to simpler to use lru_cache
instead of handling the global env var?
def get_num_service_threshold(): | |
"""Get number of services threshold, calculating it only when needed.""" | |
global _num_service_threshold | |
if _num_service_threshold is None: | |
_num_service_threshold = _get_num_service_threshold() | |
return _num_service_threshold | |
@annotations.lru_cache(scope='request') | |
def get_num_service_threshold(): | |
"""Get number of services threshold, calculating it only when needed.""" | |
system_memory_gb = psutil.virtual_memory().total // (1024**3) | |
return system_memory_gb // constants.CONTROLLER_MEMORY_USAGE_GB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, updated
sky/utils/rich_utils.py
Outdated
_console = None # Lazy initialized console | ||
|
||
|
||
# Move global console to a function |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Move global console to a function | |
# Move global console to a function to avoid importing rich console if not used |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
sky/utils/rich_utils.py
Outdated
_console = None # Lazy initialized console | ||
|
||
|
||
# Move global console to a function | ||
def get_console(): | ||
"""Get or create the rich console.""" | ||
global _console | ||
if _console is None: | ||
_console = rich_console.Console(soft_wrap=True) | ||
return _console |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, should we reuse the one in ux_utils
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The console
here is a different instance from that in the ux_utils.py
and they are initialized with different parameters rich_console.Console(soft_wrap=True)
for this one and rich_console.Console()
for the one in ux_utils.py
. Should we just use one of them? If yes, should the soft_wrap
be set to true
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think setting soft_wrap
to always be True should be fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, thanks!
Hi @Michaelvll, comments addressed except for this one #4802 (comment), if we determine to use one console instance, I think we can move the console management to a separate module, e.g. |
/smoke-test |
/smoke-test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @DanielZhangQD! It looks mostly good to me.
Can we add a unit test to make sure that we don't add those imports back again in the future edits, i.e. we should check import sky
does not import those package immediately that we have changed to LazyImports
sky/authentication.py
Outdated
filelock = adaptors_common.LazyImport('filelock') | ||
yaml = adaptors_common.LazyImport('yaml') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be good to mention the time scale of the imports here in the comments, so that future maintainance can use them as a reference
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean to comment the time of non-lazy importing filelock&yaml
here or the time of importing authentication
after this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added comments for the third-party modules in sky/adaptors/common.py
and descriptions in the contributing guide.
OK, I will add UT for this. |
* Examples | ||
* Documentation | ||
* Tutorials, blog posts and talks on SkyPilot | ||
- [Bug reports](https://github.com/skypilot-org/skypilot/issues) and [discussions](https://github.com/skypilot-org/skypilot/discussions) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These kinds of changes are done by auto format tool.
Hi @Michaelvll, the following items have been done:
PTAL again. BTW, could you please help trigger the smoke test again? Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome work @DanielZhangQD ! Just some minors
sky/utils/rich_utils.py
Outdated
@@ -219,7 +222,7 @@ def client_status(msg: str) -> Union['rich_console.Status', _NoOpConsoleStatus]: | |||
if (threading.current_thread() is threading.main_thread() and | |||
not sky_logging.is_silent()): | |||
if _statuses['client'] is None: | |||
_statuses['client'] = console.status(msg) | |||
_statuses['client'] = console_utils.get_console().status(msg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we simply put get_console
in this file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both rich_utils
and ux_utils
use the get_console
and it would cause cycle import here.
tests/unit_tests/test_sky_import.py
Outdated
assert 'jsonschema' not in sys.modules | ||
assert 'pendulum' not in sys.modules | ||
|
||
# Check that the lazy imported modules are imported after used |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure whether the following assertions are behaviors we want to keep. To me, what modules are used by the function seems like an easy to change implementation detail. Maybe existing unit tests of these functions themselves would be sufficient to ensure they still work after this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense, I will remove the check below.
tests/unit_tests/test_sky_import.py
Outdated
# Clean modules that are lazy imported by sky | ||
modules_to_clean = [ | ||
module for module in list(sys.modules.keys()) if any( | ||
module.startswith(prefix) for prefix in [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Somewhat hard to maintain. IIUC, our intention is that a module being LazyImported should be LazyImported everywhere (otherwise lazy import just does not work). Can we just list all instances of LazyImport
class and assert these modules are not actually imported? e.g.
objs = [obj for obj in gc.get_objects() if isinstance(obj, adaptor_common.LazyImport)]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Michaelvll What do you think? Is the above way good enough?
Hi @Michaelvll @aylei, Comments addressed, PTAL again. |
/smoke-test --aws |
tests/unit_tests/test_sky_import.py
Outdated
|
||
# Check that the lazy imported modules are in the list | ||
for module in lazy_import_modules: | ||
assert module in lazy_module_names, f"Module {module} is not lazy imported" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to assert the module is not in sys.modules
I think, also, only one of lazy_module_names
or lazy_import_modules
should be kept, with different intent:
- Keeping
lazy_import_modules
: we want the listed modules to be lazy imported, whatever how it is implemented - Keeping
lazy_module_names
: we want every module that has been wrapped inadaptor_common.LazyImport()
to be finally lazy imported
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I got your point. I'll go ahead and revise the case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aylei Case updated, PTAL again, thanks!
Hi @Michaelvll @aylei, the UT case has been updated to address the above comments, PTAL again. |
tests/unit_tests/test_sky_import.py
Outdated
] | ||
|
||
|
||
@pytest.fixture(autouse=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC, autouse fixture will be called for every test function. Looks like we only need this for test_sky_import?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will not be called by cases in other test files, and I will remove the autouse in case we add extra cases in the same file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, thanks!
/smoke-test |
Hi @Michaelvll, Conflicts resolved, PTAL again, thanks! |
/smoke-test --aws |
/smoke-test --aws |
Part of #4678
This PR only covers the third-party modules to import lazily, however, from the test result below, it still takes a long time to import the whole sky module.
So this needs to be improved further, e.g.:
Tested (run the relevant ones):
bash format.sh
pytest tests/test_smoke.py
pytest tests/test_smoke.py::test_fill_in_the_name
conda deactivate; bash -i tests/backward_compatibility_tests.sh