-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add stopped status for HealthCheck #25423
base: main
Are you sure you want to change the base?
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: Honny1 The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
If the container is stopped and the ongoing HealthCheck has no chance to complete the check is evaluated as stopped. Fixes: https://issues.redhat.com/browse/RUN-2520 Fixes: containers#25276 Signed-off-by: Jan Rodák <[email protected]>
dcc1405
to
d6d7ea1
Compare
if !c.batched { | ||
c.lock.Lock() | ||
defer c.lock.Unlock() | ||
|
||
if err := c.syncContainer(); err != nil { | ||
logrus.Errorf("Error syncing container %s state: %v", c.ID(), err) | ||
return false | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should really not look and sync, IMO this function should simply not exist. While I think your usage for healthcheck is fine thus will totally be misused. First a private function should not take the container look here, that is totally inconsistent with how deal with the majority of libpod functions.
Most importantly this can only lead to unsound code, you should never check the state, then unlock and then do X,Y,Z based on the state while being unlocked. So IMO just delete this func.
if exitCode != 0 && c.ensureCurrentState(define.ContainerStateStopped, define.ContainerStateStopping, define.ContainerStateExited) { | ||
hcResult = define.HealthCheckContainerStopped | ||
} | ||
|
||
hcl := newHealthCheckLog(timeStart, timeEnd, returnCode, eventLog) | ||
|
||
healthCheckResult, err := c.updateHealthCheckLog(hcl, inStartPeriod, isStartup) | ||
healthCheckResult, err := c.updateHealthCheckLog(hcl, hcResult, inStartPeriod, isStartup) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this ensureCurrentState takes the lock and updateHealthCheckLog takes the lock again, this is inefficient and should not happen.
I would say remove both locks from the functions and take on lock here in the caller instead.
if hcResult == define.HealthCheckContainerStopped { | ||
healthCheck.Status = define.HealthCheckStopped | ||
healthCheck.FailingStreak = oldFailingStreak | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like the wrong ordering, there is no need to reset FailingStreak when you don't update it in the first place.
if hcResult == define.HealthCheckContainerStopped {
healthCheck.Status = define.HealthCheckStopped
} else if !inStartPeriod {
// increment failing streak
healthCheck.FailingStreak++
// if failing streak > retries, then status to unhealthy
if healthCheck.FailingStreak >= c.HealthCheckConfig().Retries {
healthCheck.Status = define.HealthCheckUnhealthy
}
}
|
||
run_podman run -d --name $ctr \ | ||
--health-cmd "sleep 20; echo $msg" \ | ||
$IMAGE /home/podman/pause | ||
|
||
timeout --foreground -v --kill=10 60 \ | ||
$PODMAN healthcheck run $ctr & | ||
$PODMAN healthcheck run $ctr > $hcStatus & |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use &>
to capture both stdout and stderr, in case of an error printed we will see it directly in the assert below
@@ -487,13 +488,20 @@ function _check_health_log { | |||
rc=0 | |||
wait -n $hc_pid || rc=$? | |||
assert $rc -eq 1 "exit status check of healthcheck command" | |||
assert $(cat $hcStatus) =~ "stopped" "Health status" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"$(< $hcStatus)"
, cat is not needed. It Would also be good if you could do a full string match ==
here to avoid any extra error lines printed
run_podman inspect $ctr --format "{{.State.Health.Status}}" | ||
assert "$output" == "stopped" "Health status" | ||
|
||
run_podman inspect $ctr --format "{{.State.Health.FailingStreak}}" | ||
assert "$output" == "0" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
running podman inspect over and over is slow, you can get all the data in one single inspect command
i.e. see #24749 (comment) and 8fa1ffb
This PR adds new status for HealthCheck. If the container is stopped and the ongoing HealthCheck has no chance to complete, the check is evaluated as stopped.
Fixes: https://issues.redhat.com/browse/RUN-2520
Fixes: #25276
Does this PR introduce a user-facing change?