Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stricter kubeadm validation (config and runtime checks) #11710

Merged

Conversation

VannTen
Copy link
Contributor

@VannTen VannTen commented Nov 13, 2024

What type of PR is this?
/kind cleanup

What this PR does / why we need it:
This makes kubespray stricter regarding kubeadm config and errors in 2 ways:

  1. validate the kubeadm config files on templating
  2. don't ignore preflight errors by default. Instead we expose a variable to let users ignore certain errors if needed.

We need this in order to catch bad formatting of the config files (which are otherwise non-fatal), which could led to settings not applied at all.

Also some cleanups, and fix some stuff introducted in v1beta4 support #11674 (not caught because of the lack of that validation, precisely)

Special notes for your reviewer:
Apparently, we ignore kubeadm errors since the introduction of kubeadm support in #1631 (commit 6da20e2

Does this PR introduce a user-facing change?:

action required
`kubeadm_ignore_preflight_errors` is introduced to ignore specific preflight checks from kubeadm. The previous was effectively `all`, so some errors might surface during upgrade, in which cases, users should add the ones they choose to ignore to that variable.

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note-none Denotes a PR that doesn't merit a release note. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. labels Nov 13, 2024
@VannTen
Copy link
Contributor Author

VannTen commented Nov 13, 2024

/ok-to-test

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Nov 13, 2024
@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Nov 13, 2024
@VannTen VannTen force-pushed the cleanup/no_ignore_kubeam_errors branch from 1535f68 to 30a4be2 Compare November 14, 2024 11:23
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Nov 14, 2024
@VannTen VannTen force-pushed the cleanup/no_ignore_kubeam_errors branch from 30a4be2 to 1d2a84f Compare November 14, 2024 11:50
The "ignoring all errors" seems to date back to the inception of the
kubeadm support (it was --skip-preflight-check before).

This can mask real errors and prevent users from seeing them.

Do not ignore any errors by default and make the set of ignored errors
configurable.
The mode is already set by the previous `copy` task.
This should help to fail early when we have invalid kubeadm configs (from
a kubespray bug or a misconfiguration).
@VannTen VannTen force-pushed the cleanup/no_ignore_kubeam_errors branch from 1d2a84f to 53501cc Compare November 14, 2024 12:03
@VannTen
Copy link
Contributor Author

VannTen commented Nov 14, 2024

Well, the validation works:

TASK [kubernetes/kubeadm : Create kubeadm client config] ***********************
task path: /builds/kargo-ci/kubernetes-sigs-kubespray/roles/kubernetes/kubeadm/tasks/main.yml:74
fatal: [test-vm-gv46p]: FAILED! => {"changed": false, "checksum": "562a882839726dd23f5de3de823925aa2f7ca942", "exit_status": 1, "msg": "failed to validate", "stderr": "error unmarshaling configuration schema.GroupVersionKind{Group:\"kubeadm.k8s.io\", Version:\"v1beta4\", Kind:\"JoinConfiguration\"}: strict decoding error: unknown field \"discovery.timeout\"\nTo see the stack trace of this error execute with --v=5 or higher\n", "stderr_lines": ["error unmarshaling configuration schema.GroupVersionKind{Group:\"kubeadm.k8s.io\", Version:\"v1beta4\", Kind:\"JoinConfiguration\"}: strict decoding error: unknown field \"discovery.timeout\"", "To see the stack trace of this error execute with --v=5 or higher"], "stdout": "", "stdout_lines": []}

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. release-note-action-required Denotes a PR that introduces potentially breaking changes that require user action. and removed release-note-none Denotes a PR that doesn't merit a release note. release-note Denotes a PR that will be considered when it comes time to generate release notes. labels Nov 14, 2024
@VannTen
Copy link
Contributor Author

VannTen commented Nov 14, 2024

I'm not sure why the vagrant job is still failing, the v1beta4 should be fixed there as well 🤔

I had missed one.

{
  "changed": false,
  "checksum": "387f5d5b7358fd92b61070db17c06edd4e56827e",
  "exit_status": 1,
  "msg": "failed to validate",
  "stderr": "error unmarshaling configuration schema.GroupVersionKind{Group:\"kubeadm.k8s.io\", Version:\"v1beta4\", Kind:\"JoinConfiguration\"}: strict decoding error: unknown field \"discovery.timeout\"\nTo see the stack trace of this error execute with --v=5 or higher\n",
  "stderr_lines": [
    "error unmarshaling configuration schema.GroupVersionKind{Group:\"kubeadm.k8s.io\", Version:\"v1beta4\", Kind:\"JoinConfiguration\"}: strict decoding error: unknown field \"discovery.timeout\"",
    "To see the stack trace of this error execute with --v=5 or higher"
  ],
  "stdout": "",
  "stdout_lines": []
}

@VannTen VannTen force-pushed the cleanup/no_ignore_kubeam_errors branch from 8fb943c to bf7ede7 Compare November 14, 2024 14:28
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Nov 14, 2024
@VannTen VannTen changed the title WIP: Don't ignore kubeadm errors Stricter kubeadm validation (config and runtime checks) Nov 14, 2024
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 14, 2024
@VannTen
Copy link
Contributor Author

VannTen commented Nov 14, 2024

/cc @tico88612

Copy link
Member

@tico88612 tico88612 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default value of the variable --ignore-preflight-errors is all, which is weird. (face plam laugh)
It's a good change, thank you!
/lgtm

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: MrFreezeex, tico88612, VannTen

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 15, 2024
@k8s-ci-robot k8s-ci-robot merged commit 68718dc into kubernetes-sigs:master Nov 15, 2024
41 checks passed
@VannTen VannTen deleted the cleanup/no_ignore_kubeam_errors branch November 29, 2024 15:07
kpoxo6op pushed a commit to kpoxo6op/kubespray that referenced this pull request Dec 27, 2024
…igs#11710)

* kubeadm: do not ignore preflight errors blindly

The "ignoring all errors" seems to date back to the inception of the
kubeadm support (it was --skip-preflight-check before).

This can mask real errors and prevent users from seeing them.

Do not ignore any errors by default and make the set of ignored errors
configurable.

* download/kubeadm: remove redundant task

The mode is already set by the previous `copy` task.

* Validate kubeadm configs

This should help to fail early when we have invalid kubeadm configs (from
a kubespray bug or a misconfiguration).

* kubeadm-upgrade: remove unnecessary bool cast

* Convert kubeadm join discovery timeout to v1beta4 config

* CI: Ignore kubeadm:Mem errors on some setup.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note-action-required Denotes a PR that introduces potentially breaking changes that require user action. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants