-
Notifications
You must be signed in to change notification settings - Fork 6.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[core][experimental] Accelerated DAG should execute work on actor's main thread #46336
Comments
Default behavior: the DAG should run on the main actor thread Additional API: if user specifies a thread name during If a thread is already being used by some other DAG, throw an error during compile. |
I will be working on this. |
system-level tasks => ex: close channel, resize channel |
@stephanie-wang I took a look at the issue. The problem is that any normal It does not seem worth it to support accessing thread-local state while disabling users from submitting normal tasks. def test_simulate_pipeline_parallelism(ray_start_regular, single_fetch):
...
worker_0 = Worker.remote(0)
worker_1 = Worker.remote(1)
# Worker 0: FFFBBB
# Worker 1: BBB
with InputNode() as inp:
w0_input = worker_0.read_input.bind(inp)
...
output_dag = MultiOutputNode([d03, d04, d05])
output_dag = output_dag.experimental_compile()
res = output_dag.execute([0, 1, 2])
...
assert ray.get(res) == [0, 1, 2]
assert ray.get(worker_0.get_logs.remote()) == [ # <-- block here
"FWD rank-0, batch-0",
"FWD rank-0, batch-1",
"FWD rank-0, batch-2",
"BWD rank-0, batch-0",
"BWD rank-0, batch-1",
"BWD rank-0, batch-2",
] I used py-spy to check the main thread of ![]() |
Maybe it is not worth supporting access to thread-local state if it blocks normal tasks. |
#50032 attempts to move the execution loop to the same thread as the default thread (the thread used when
Based on the two conclusions mentioned above, it is difficult to achieve our goal of putting the execution loop and |
b000620 this commit moves the constructor to run on the default executor. I ran the following script and the output is attached below. Both def test_execute_on_actor_thread(shutdown_only):
import threading
@ray.remote
class ThreadLocalActor:
def __init__(self):
# data local to actor default executor thread
print("init")
current_thread = threading.current_thread()
print(current_thread)
self.local_data = threading.local()
print("local data",self.local_data)
self.local_data.value = 42
print("local data value",self.local_data.value)
print("value", id(self.local_data.value))
def compute(self, value):
print("compute")
current_thread = threading.current_thread()
print(current_thread)
print("local data",self.local_data)
if hasattr(self.local_data, "value"):
print("self.local_data.value", self.local_data.value)
else:
print("self.local_data.value is not set")
return value + self.local_data.value
actor = ThreadLocalActor.remote()
assert ray.get(actor.compute.remote(1)) == 43 ![]() |
Description
Currently actor tasks in the DAG execute on a background thread on the actor, which is a bit of a gotcha when the actor has some thread-local state that is initialized before the DAG is created. We should move the execution loop to the main thread and execute other system-level tasks on a background thread.
Use case
No response
The text was updated successfully, but these errors were encountered: