Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update the Supervisor endpoint to not restart the Supervisor if the spec was unmodified #17707

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

aho135
Copy link
Contributor

@aho135 aho135 commented Feb 8, 2025

Description

This PR adds an optional query parameter called restartIfUnmodified to the /druid/indexer/v1/supervisor endpoint. The caller can optionally set restartIfUnmodified=false so that the supervisor is not restarted if the spec is unchanged. Multiple members of the community mentioned that they maintain their own scripts to check whether the spec has changed before submitting to the endpoint: https://apachedruidworkspace.slack.com/archives/C0303FDCZEZ/p1738017586080509

For those that rely on this endpoint for restarting the supervisor, the behavior remains unchanged as restartIfUnmodified defaults to true.

This PR also helps avoid unnecessary updates to the metastore when updating retention rules by first checking if the rules were updated.

Release note

Adds an optional query parameter called restartIfUnmodified to the /druid/indexer/v1/supervisor endpoint. Callers can set restartIfUnmodified=false to not restart the supervisor if the spec is unchanged. Example:

curl -X POST --header "Content-Type: application/json" -d @supervisor.json localhost:8888/druid/indexer/v1/supervisor?restartIfUnmodified=false


Key changed/added classes in this PR
  • SupervisorResource
  • SupervisorManager
  • SupervisorResourceTest
  • SQLMetadataRuleManager
  • SQLMetadataRuleManagerTest

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

Copy link
Contributor

@kfaraz kfaraz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aho135 , thanks for the PR. It makes sense to not update/restart the supervisor if not required.

I have left some minor feedback on the approach.

@AmatyaAvadhanula
Copy link
Contributor

AmatyaAvadhanula commented Feb 9, 2025

Thank you for these changes, @aho135!

I think we would benefit from a change where we check if the spec has changed. If it hasn't we still restart the supervisor, but do not go to the metadata store and add an unnecessary entry in the spec history. Otherwise, the flow remains unchanged. I think @kfaraz has suggested this as well.
I believe the supervisor restart itself would be helpful to rollover tasks easily or to get out of an idle supervisor state etc.

I also wanted to understand if the problem was with the metadata operations associated with it including an unneeded entry, or if the supervisor operation is also problematic.

If it is just the first case, is a feature flag really needed?
I believe we should skip the metadata operation and history update as there is no benefit in both cases

If you still believe that the supervisor operation is wasteful, and want to introduce a flag, please add the relevant docs in docs/api-reference/supervisor-api.md.

@aho135
Copy link
Contributor Author

aho135 commented Feb 10, 2025

Thank you for these changes, @aho135!

I think we would benefit from a change where we check if the spec has changed. If it hasn't we still restart the supervisor, but do not go to the metadata store and add an unnecessary entry in the spec history. Otherwise, the flow remains unchanged. I think @kfaraz has suggested this as well. I believe the supervisor restart itself would be helpful to rollover tasks easily or to get out of an idle supervisor state etc.

I also wanted to understand if the problem was with the metadata operations associated with it including an unneeded entry, or if the supervisor operation is also problematic.

If it is just the first case, is a feature flag really needed? I believe we should skip the metadata operation and history update as there is no benefit in both cases

If you still believe that the supervisor operation is wasteful, and want to introduce a flag, please add the relevant docs in docs/api-reference/supervisor-api.md.

Thanks for the review @AmatyaAvadhanula! My original motivation for this change was to avoid unnecessary restarts of the Supervisor if possible. Our use case is that we maintain a repository of schemas and do periodic releases. It is often unclear which schemas were actually modified. We want to be able to submit them all, and just restart the Supervisors which had schema updates. This is so we can avoid the undesirable side effects of task restart, such as small segments.

With this use case in mind, I think that having the feature flag does make sense. I will add an update in the relevant doc

@aho135
Copy link
Contributor Author

aho135 commented Feb 11, 2025

Hi @kfaraz @AmatyaAvadhanula I've made the changes you suggested and added some unit tests. Would appreciate if you could take a look again. Thank you!

@aho135 aho135 requested a review from kfaraz February 18, 2025 21:54
@aho135
Copy link
Contributor Author

aho135 commented Feb 24, 2025

@kfaraz I've added an additional check for retention rules to avoid metastore updates if the retention rules were unchanged. This brings the behavior in line with the compaction config endpoint: https://github.com/apache/druid/blob/master/server/src/main/java/org/apache/druid/server/coordinator/CoordinatorConfigManager.java#L110-L111

@JRobTS
Copy link

JRobTS commented Feb 25, 2025

Every language defaults boolean values to false and many treat null as false.

Maybe a small thing but I feel like the feature flag should default to false when not present. Introducing the new flag as something like skipRestartIfUnmodified=true would be better.

@aho135
Copy link
Contributor Author

aho135 commented Feb 26, 2025

Every language defaults boolean values to false and many treat null as false.

Maybe a small thing but I feel like the feature flag should default to false when not present. Introducing the new flag as something like skipRestartIfUnmodified=true would be better.

Thanks for the review @JRobTS Agreed that defaulting the flag to false makes more sense. I updated this in the subsequent PR. Please let me know what you think!

#### Sample request with restartIfUnmodified
The following example sets the restartIfUnmodified flag to false. With this flag set to false, the Supervisor will only restart if there has been a modification to the SupervisorSpec.
#### Sample request with skipRestartIfUnmodified
The following example sets the skipRestartIfUnmodified flag to true. With this flag set to false, the Supervisor will only restart if there has been a modification to the SupervisorSpec.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in the 2nd sentence: "With this flag set to false ..."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, good catch @JRobTS this was updated in the subsequent commit

Copy link
Contributor

@kfaraz kfaraz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update on this, @aho135 .

Comment on lines +402 to +407
log.error(e,
"Failed to upgrade pending segment[%s] to new pending segment[%s] on Supervisor[%s].",
upgradedPendingSegment.getUpgradedFromSegmentId(),
upgradedPendingSegment.getId().getVersion(),
supervisorId
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
log.error(e,
"Failed to upgrade pending segment[%s] to new pending segment[%s] on Supervisor[%s].",
upgradedPendingSegment.getUpgradedFromSegmentId(),
upgradedPendingSegment.getId().getVersion(),
supervisorId
);
log.error(
e,
"Failed to upgrade pending segment[%s] to new pending segment[%s] on Supervisor[%s].",
upgradedPendingSegment.getUpgradedFromSegmentId(),
upgradedPendingSegment.getId().getVersion(),
supervisorId
);

Comment on lines +124 to +125
@Context final HttpServletRequest req,
@QueryParam("skipRestartIfUnmodified") boolean skipRestartIfUnmodified
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style: Make the request the last argument of this method.

Comment on lines +124 to +125
@Context final HttpServletRequest req,
@QueryParam("skipRestartIfUnmodified") boolean skipRestartIfUnmodified
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use a boxed Boolean instead and handle nulls to make the default behaviour more obvious.

@@ -60,6 +62,7 @@
@RunWith(EasyMockRunner.class)
public class SupervisorManagerTest extends EasyMockSupport
{
private static final ObjectMapper MAPPER = new DefaultObjectMapper();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add an empty line after this

manager.start();
Assert.assertFalse(manager.shouldUpdateSupervisor(spec));
Assert.assertTrue(manager.shouldUpdateSupervisor(spec2));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add an empty line after this.

@@ -2409,6 +2409,7 @@ ddSketch
DDSketch
druid-ddsketch
numBins
skipRestartIfUnmodified
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You wouldn't need this spelling entry if you use skipRestartIfUnmodified (with backquotes) instead of skipRestartIfUnmodified in the docs.

@@ -2353,6 +2353,63 @@ Content-Length: 1359
</TabItem>
</Tabs>

#### Sample request with skipRestartIfUnmodified
The following example sets the skipRestartIfUnmodified flag to true. With this flag set to true, the Supervisor will only restart if there has been a modification to the SupervisorSpec.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The following example sets the skipRestartIfUnmodified flag to true. With this flag set to true, the Supervisor will only restart if there has been a modification to the SupervisorSpec.
The following example sets the `skipRestartIfUnmodified` flag to true. With this flag set to true, the Supervisor will only restart if there has been a modification to the SupervisorSpec.

* Checks whether the submitted SupervisorSpec differs from the current spec in SupervisorManager's supervisor list.
* This is used in SupervisorResource specPost to determine whether the Supervisor needs to be restarted
* @param spec The spec submitted
* @return boolean - false if the spec is unchanged, else true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* @return boolean - false if the spec is unchanged, else true
* @return true only if the spec has been modified, false otherwise

Comment on lines +192 to +196
if (currentSupervisor != null &&
Arrays.equals(specAsBytes, jsonMapper.writeValueAsBytes(currentSupervisor.rhs))
) {
return false;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you may simplify this as follows:

Suggested change
if (currentSupervisor != null &&
Arrays.equals(specAsBytes, jsonMapper.writeValueAsBytes(currentSupervisor.rhs))
) {
return false;
}
return currentSupervisor == null
|| !Arrays.equals(specAsBytes, jsonMapper.writeValueAsBytes(currentSupervisor.rhs);

* @param spec The spec submitted
* @return boolean - false if the spec is unchanged, else true
*/
public boolean shouldUpdateSupervisor(SupervisorSpec spec)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the suggestion from @AmatyaAvadhanula , you can also update the createOrUpdate method to create a new entry in DB only if needed.

/**
 * Creates or updates a supervisor and then starts it.
 * If no change has been made to the supervisor spec, it is only restarted.
 *
 * @return true if the supervisor was updated, false otherwise
 */ 
public boolean createOrUpdateAndStartSupervisor(SupervisorSpec spec)
  {
    Preconditions.checkState(started, "SupervisorManager not started");
    Preconditions.checkNotNull(spec, "spec");
    Preconditions.checkNotNull(spec.getId(), "spec.getId()");
    Preconditions.checkNotNull(spec.getDataSources(), "spec.getDatasources()");

    synchronized (lock) {
      Preconditions.checkState(started, "SupervisorManager not started");
      
      // Persist a new version of the spec only if it has been updated
      final boolean shouldUpdateSpec = shouldUpdateSupervisor(spec);
      
      possiblyStopAndRemoveSupervisorInternal(spec.getId(), false);
      createAndStartSupervisorInternal(spec, shouldUpdateSpec);
      
      return shouldUpdateSpec;
    }
  }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants