Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ServiceMonitor TLS Verification Failure: Missing IP SAN in Self-Signed Certificate #4330

Open
sbathgate opened this issue Feb 19, 2025 · 2 comments
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@sbathgate
Copy link

What happened:
Prometheus was unable to scrape the kueue controller-manager's metrics endpoint due to a TLS certificate verification error. The self-signed certificate used by the built-in certs did not include the IP address in its SANs, resulting in the error:

tls: failed to verify certificate: x509: cannot validate certificate for xxx.xxx.xxx.xxx because it doesn’t contain any IP SANs

What you expected to happen:
The ServiceMonitor should successfully scrape the metrics endpoint over HTTPS, and Prometheus should display the metrics as "up" without any TLS errors.

How to reproduce it (as minimally and precisely as possible):

  1. Deploy the kueue controller-manager using the built-in self-signed certificates (not using cert-manager) and enablePrometheus.
enablePrometheus: true
enableCertManager: false
  1. Observe that Prometheus logs show a TLS error similar to:
Get "https://xxx.xxx.xxx.xxx:8443/metrics": tls: failed to verify certificate: x509: cannot validate certificate for xxx.xxx.xxx.xxx because it doesn't contain any IP SANs

Anything else we need to know?:
The issue was resolved by specifying a serverName in the tlsConfig section of the ServiceMonitor. The serverName value must match the Common Name (or one of the SANs) on the self-signed certificate. The updated ServiceMonitor configuration could be achieved as shown below:

spec:
  endpoints:
  - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
    path: /metrics
    port: https
    scheme: https
    tlsConfig:
      insecureSkipVerify: true
      serverName: {{ include "kueue.fullname" . }}-controller-manager-metrics-service.{{ .Release.Namespace }}.svc

By adding the serverName field, Prometheus now verifies the certificate against the expected hostname rather than the IP address, which resolved the TLS error.

Environment:
• Kubernetes version (use kubectl version): 1.27.10
• Kueue version (use git describe --tags --dirty --always): v0.10.1
• Cloud provider or hardware configuration: OpenStack Ussuri
• OS (e.g: cat /etc/os-release): AlmaLinux8

@sbathgate sbathgate added the kind/bug Categorizes issue or PR as related to a bug. label Feb 19, 2025
@kannon92
Copy link
Contributor

Wow! Thank you the report and the recommended fix.

Would you be interested in contributing a patch for this?

@mimowo
Copy link
Contributor

mimowo commented Feb 24, 2025

+1, IIUC, we should start by adjusting the config, config/components/prometheus/monitor.yaml. Then, the yaml for the chart is derived by ./hack/update-helm.sh which may require an adjustment too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

3 participants