Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: add "dry_run" label to argocd_app_info metric #22010

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

jsolana
Copy link
Contributor

@jsolana jsolana commented Feb 25, 2025

Signed-off-by: Javier Solana [email protected]

Fiixes #21899

Add a "dry_run" label to distinguish dryrun activity from real ones to argocd_app_info metric.

The main issue is that if there are alerts based on these metrics, and the dry-run execution identifies an error (e.g., a change that violates a Kyverno policy or an invalid CRD schema) updates the metric, which can potentially trigger alerts based on it.

Adding the dry_run label allow distinguish real activity from dryrun. Eg:

# alert manager alert definition
- alert: ArgoCDApplicationUnknown
  expr: sum by (cluster) (argocd_app_info{sync_status="Unknown", dry_run!="true"}) > 0
  for: 15m
...

Note: This situation has been occurring since the origins of the argocd_app_info metric.

Checklist:

  • Either (a) I've created an enhancement proposal and discussed it with the community, (b) this is a bug fix, or (c) this does not need to be in the release notes.
  • The title of the PR states what changed and the related issues number (used for the release note).
  • The title of the PR conforms to the Toolchain Guide
  • I've included "Closes [ISSUE #]" or "Fixes [ISSUE #]" in the description to automatically close the associated issue.
  • I've updated both the CLI and UI to expose my feature, or I plan to submit a second PR with them.
  • Does this PR require documentation updates?
  • I've updated documentation as required by this PR.
  • I have signed off all my commits as required by DCO
  • I have written unit and/or e2e tests for my change. PRs without these are unlikely to be merged.
  • My build is green (troubleshooting builds).
  • My new feature complies with the feature status guidelines.
  • I have added a brief description of why this PR is necessary and/or what this PR solves.
  • Optional. My organization is added to USERS.md.
  • Optional. For bug fixes, I've indicated what older releases this fix should be cherry-picked into (this may or may not happen depending on risk/complexity).

@jsolana jsolana requested a review from a team as a code owner February 25, 2025 19:36
Copy link

bunnyshell bot commented Feb 25, 2025

🔴 Preview Environment stopped on Bunnyshell

See: Environment Details | Pipeline Logs

Available commands (reply to this comment):

  • 🔵 /bns:start to start the environment
  • 🚀 /bns:deploy to redeploy the environment
  • /bns:delete to remove the environment

Copy link

codecov bot commented Feb 25, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 55.74%. Comparing base (49a4b7f) to head (c81082c).

Additional details and impacted files
@@            Coverage Diff             @@
##           master   #22010      +/-   ##
==========================================
+ Coverage   55.71%   55.74%   +0.02%     
==========================================
  Files         342      342              
  Lines       57074    57078       +4     
==========================================
+ Hits        31800    31818      +18     
+ Misses      22632    22619      -13     
+ Partials     2642     2641       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@agaudreault agaudreault left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dry-run is a temporary parameter on the operation. It is not meant to be persisted in the application information since it is a transient attribute. It would make sense to add it to "operations" metrics, because it is the operation that is in dry-run. This way you could know how many dry-run vs normal operations are made.

I think this is not the correct way to address the linked issue

@jsolana
Copy link
Contributor Author

jsolana commented Feb 25, 2025

It would make sense to add it to "operations" metrics, because it is the operation that is in dry-run. This way you could know how many dry-run vs normal operations are made.

I think both approaches make sense.

On one hand, adding the dry_run label to metrics more related to operations, such as argocd_app_k8s_request_total and argocd_app_sync_total, makes sense. However, I also believe it is reasonable to include it in argocd_app_info, despite its transient nature. This is because, at the moment metrics are collected, the application's reflected state could correspond to a dry-run operation. With this label, it can be ignored at the alert definition level.

The main idea of this issue is not only solve questions like how many dry-run vs normal operations are made but the status of my apps are real or based in a dryrun?

Copy link
Member

@nitishfy nitishfy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the failing tests that you asked on slack, have you tried running make codegen-local command before pushing changes?

…_total and argocd_app_sync_total

Signed-off-by: Javier Solana <[email protected]>
@jsolana
Copy link
Contributor Author

jsolana commented Feb 26, 2025

For the failing tests that you asked on slack, have you tried running make codegen-local command before pushing changes?

Already done locally and there is no change. With a new commit it is solved then, It seems was a transient error. Thanks for your help!

Copy link

@skyscannerBajpai skyscannerBajpai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@skyscannerBajpai
Copy link

Should we also add dry_run label to logAppEvent (https://github.com/argoproj/argo-cd/blob/master/controller/appcontroller.go#L2540) ?

@jsolana jsolana requested a review from agaudreault February 27, 2025 17:39
@jsolana
Copy link
Contributor Author

jsolana commented Feb 27, 2025

Should we also add dry_run label to logAppEvent (https://github.com/argoproj/argo-cd/blob/master/controller/appcontroller.go#L2540) ?

Mm maybe is interesting for logs perspective to add dryrun info but I think it already add this info in logs.

Anyway in this case I'll prefer to move it in a different PR and issue cause is not as easy as modify the contract of logAppEvent (internally it uses ctrl.auditLogger.LogAppEvent that is used extensively throughout the project)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants