You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Ray Serve failed recovery from a checkpoint with the below stacktrace. One of the deployments being recovered was "model1#infer_actor", which was allowed as a deployment name, but apparently messes up the delimiter splitting in from_full_id_str()
ray.exceptions.ActorDiedError: The actor died because of an error raised in its creation task, ray::SERVE_CONTROLLER_ACTOR:ServeController.__init__() (pid=22146, ip=x.x.x.x, actor_id=9a9a8479cde4a0a033e57ee602000000, repr=<ray.serve._private.controller.ServeController object at 0x7f6dfad5de40>)
File "/home/ray/anaconda3/lib/python3.10/concurrent/futures/_base.py", line 458, in result
return self.__get_result()
File "/home/ray/anaconda3/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
raise self._exception
File "/home/ray/anaconda3/lib/python3.10/site-packages/ray/serve/_private/controller.py", line 177, in __init__
self.deployment_state_manager = DeploymentStateManager(
File "/home/ray/anaconda3/lib/python3.10/site-packages/ray/serve/_private/deployment_state.py", line 2348, in __init__
self._recover_from_checkpoint(
File "/home/ray/anaconda3/lib/python3.10/site-packages/ray/serve/_private/deployment_state.py", line 2457, in _recover_from_checkpoint
deployment_to_current_replicas = self._map_actor_names_to_deployment(
File "/home/ray/anaconda3/lib/python3.10/site-packages/ray/serve/_private/deployment_state.py", line 2390, in _map_actor_names_to_deployment
replica_id = ReplicaID.from_full_id_str(replica_name)
File "/home/ray/anaconda3/lib/python3.10/site-packages/ray/serve/_private/common.py", line 70, in from_full_id_str
raise ValueError(
ValueError: Given replica ID string SERVE_REPLICA::model1#infer_actor#model1#infer_actor#15k12vcw didn't match expected pattern, ensure it has either two or three fields with delimiter '#'.
Versions / Dependencies
Ray 2.36.0, python 3.10, linux
Reproduction script
Create deployment with a # in the name.
Crash your cluster in some horrible way that makes you have to recover from a checkpoint.
Profit?
Issue Severity
Low: It annoys or frustrates me.
The text was updated successfully, but these errors were encountered:
chmeyers
added
bug
Something that is supposed to be working; but isn't
triage
Needs triage (eg: priority, bug/not-bug, and owning component)
labels
Oct 24, 2024
edoakes
added
P1
Issue that should be fixed within a few weeks
serve
Ray Serve Related Issue
and removed
triage
Needs triage (eg: priority, bug/not-bug, and owning component)
labels
Oct 24, 2024
What happened + What you expected to happen
Ray Serve failed recovery from a checkpoint with the below stacktrace. One of the deployments being recovered was "model1#infer_actor", which was allowed as a deployment name, but apparently messes up the delimiter splitting in
from_full_id_str()
Versions / Dependencies
Ray 2.36.0, python 3.10, linux
Reproduction script
Issue Severity
Low: It annoys or frustrates me.
The text was updated successfully, but these errors were encountered: