Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Syncing latest changes from main for ramen #277

Merged
merged 44 commits into from
May 28, 2024
Merged

Conversation

df-build-team
Copy link

PR containing the latest commits from main branch

rakeshgm and others added 30 commits May 14, 2024 13:54
add ConfigMap under dr-cluster kustomize
transformer to update label "app: ramen-dr-cluster"

Signed-off-by: rakeshgm <[email protected]>
Signed-off-by: Raghavendra Talur <[email protected]>
Signed-off-by: Raghavendra Talur <[email protected]>
When fetching the same cache item concurrently, for example when from
same addon on 2 clusters, or addon and fetch cron job running
concurrently,  one fetcher can delete the temporary file used by the
other fetcher, causing this error:

    drenv.commands.Error: Command failed:
       command: ('addons/rook-cephfs/start', 'dr1')
       exitcode: 1
       error:
          Traceback (most recent call last):
            File "/home/.../go/src/github.com/ramendr/ramen/test/addons/rook-cephfs/start", line 46, in <module>
              deploy(cluster)
            File "/home/.../go/src/github.com/ramendr/ramen/test/addons/rook-cephfs/start", line 17, in deploy
              cache.fetch(".", path)
            File "/home/.../go/src/github.com/ramendr/ramen/test/drenv/cache.py", line 28, in fetch
              os.rename(tmp, dest)
          FileNotFoundError: [Errno 2] No such file or directory: '/home/.../.cache/drenv/addons/rook-cephfs.yaml.tmp'
                             -> '/home/.../.cache/drenv/addons/rook-cephfs.yaml'

Fixed by using temporary file per process. If we have 2 fetchers, the
last one will win, renaming its temporary file to the actual cache file.

Example run with multiple fetchers:

    $ drenv clear
    2024-05-13 00:15:59,145 INFO    [main] Clearing cache
    2024-05-13 00:15:59,146 INFO    [main] Cache cleared in 0.00 seconds

    $ for i in 1 2 3 4; do (drenv fetch envs/regional-dr.yaml &); done
    2024-05-13 00:15:59,318 INFO    [rdr] Fetching
    2024-05-13 00:15:59,320 INFO    [rdr] Running addons/rook-operator/fetch
    2024-05-13 00:15:59,321 INFO    [rdr] Fetching
    2024-05-13 00:15:59,322 INFO    [rdr] Running addons/rook-cluster/fetch
    2024-05-13 00:15:59,322 INFO    [rdr] Running addons/rook-toolbox/fetch
    2024-05-13 00:15:59,323 INFO    [rdr] Running addons/rook-operator/fetch
    2024-05-13 00:15:59,323 INFO    [rdr] Running addons/rook-cephfs/fetch
    2024-05-13 00:15:59,323 INFO    [rdr] Running addons/recipe/fetch
    2024-05-13 00:15:59,323 INFO    [rdr] Running addons/csi-addons/fetch
    2024-05-13 00:15:59,325 INFO    [rdr] Running addons/rook-cluster/fetch
    2024-05-13 00:15:59,325 INFO    [rdr] Running addons/rook-toolbox/fetch
    2024-05-13 00:15:59,325 INFO    [rdr] Running addons/rook-cephfs/fetch
    2024-05-13 00:15:59,327 INFO    [rdr] Running addons/ocm-controller/fetch
    2024-05-13 00:15:59,333 INFO    [rdr] Running addons/csi-addons/fetch
    2024-05-13 00:15:59,341 INFO    [rdr] Running addons/ocm-controller/fetch
    2024-05-13 00:15:59,345 INFO    [rdr] Running addons/recipe/fetch
    2024-05-13 00:15:59,356 INFO    [rdr] Fetching
    2024-05-13 00:15:59,365 INFO    [rdr] Running addons/rook-operator/fetch
    2024-05-13 00:15:59,371 INFO    [rdr] Fetching
    2024-05-13 00:15:59,374 INFO    [rdr] Running addons/rook-operator/fetch
    2024-05-13 00:15:59,377 INFO    [rdr] Running addons/rook-cluster/fetch
    2024-05-13 00:15:59,378 INFO    [rdr] Running addons/csi-addons/fetch
    2024-05-13 00:15:59,388 INFO    [rdr] Running addons/rook-cluster/fetch
    2024-05-13 00:15:59,391 INFO    [rdr] Running addons/recipe/fetch
    2024-05-13 00:15:59,395 INFO    [rdr] Running addons/rook-cephfs/fetch
    2024-05-13 00:15:59,397 INFO    [rdr] Running addons/rook-cephfs/fetch
    2024-05-13 00:15:59,411 INFO    [rdr] Running addons/ocm-controller/fetch
    2024-05-13 00:15:59,412 INFO    [rdr] Running addons/csi-addons/fetch
    2024-05-13 00:15:59,414 INFO    [rdr] Running addons/rook-toolbox/fetch
    2024-05-13 00:15:59,418 INFO    [rdr] Running addons/recipe/fetch
    2024-05-13 00:15:59,419 INFO    [rdr] Running addons/rook-toolbox/fetch
    2024-05-13 00:15:59,450 INFO    [rdr] Running addons/ocm-controller/fetch
    2024-05-13 00:16:00,521 INFO    [rdr] addons/rook-toolbox/fetch completed in 1.20 seconds
    2024-05-13 00:16:00,638 INFO    [rdr] addons/csi-addons/fetch completed in 1.26 seconds
    2024-05-13 00:16:00,793 INFO    [rdr] addons/rook-cephfs/fetch completed in 1.47 seconds
    2024-05-13 00:16:00,804 INFO    [rdr] addons/rook-cephfs/fetch completed in 1.41 seconds
    2024-05-13 00:16:00,830 INFO    [rdr] addons/rook-toolbox/fetch completed in 1.51 seconds
    2024-05-13 00:16:00,831 INFO    [rdr] addons/csi-addons/fetch completed in 1.51 seconds
    2024-05-13 00:16:00,922 INFO    [rdr] addons/rook-cluster/fetch completed in 1.54 seconds
    2024-05-13 00:16:00,938 INFO    [rdr] addons/rook-toolbox/fetch completed in 1.52 seconds
    2024-05-13 00:16:00,987 INFO    [rdr] addons/rook-cephfs/fetch completed in 1.66 seconds
    2024-05-13 00:16:01,106 INFO    [rdr] addons/rook-toolbox/fetch completed in 1.69 seconds
    2024-05-13 00:16:01,130 INFO    [rdr] addons/rook-cluster/fetch completed in 1.81 seconds
    2024-05-13 00:16:01,191 INFO    [rdr] addons/csi-addons/fetch completed in 1.86 seconds
    2024-05-13 00:16:01,234 INFO    [rdr] addons/rook-cluster/fetch completed in 1.91 seconds
    2024-05-13 00:16:01,267 INFO    [rdr] addons/rook-cluster/fetch completed in 1.88 seconds
    2024-05-13 00:16:01,314 INFO    [rdr] addons/csi-addons/fetch completed in 1.90 seconds
    2024-05-13 00:16:01,414 INFO    [rdr] addons/rook-cephfs/fetch completed in 2.02 seconds
    2024-05-13 00:16:01,591 INFO    [rdr] addons/recipe/fetch completed in 2.25 seconds
    2024-05-13 00:16:01,597 INFO    [rdr] addons/recipe/fetch completed in 2.27 seconds
    2024-05-13 00:16:01,696 INFO    [rdr] addons/recipe/fetch completed in 2.31 seconds
    2024-05-13 00:16:01,938 INFO    [rdr] addons/recipe/fetch completed in 2.52 seconds
    2024-05-13 00:16:02,094 INFO    [rdr] addons/rook-operator/fetch completed in 2.73 seconds
    2024-05-13 00:16:02,248 INFO    [rdr] addons/rook-operator/fetch completed in 2.87 seconds
    2024-05-13 00:16:02,252 INFO    [rdr] addons/rook-operator/fetch completed in 2.93 seconds
    2024-05-13 00:16:02,321 INFO    [rdr] addons/rook-operator/fetch completed in 3.00 seconds
    2024-05-13 00:16:05,471 INFO    [rdr] addons/ocm-controller/fetch completed in 6.02 seconds
    2024-05-13 00:16:05,472 INFO    [rdr] Fetching finishied in 6.10 seconds
    2024-05-13 00:16:05,918 INFO    [rdr] addons/ocm-controller/fetch completed in 6.51 seconds
    2024-05-13 00:16:05,919 INFO    [rdr] Fetching finishied in 6.56 seconds
    2024-05-13 00:16:06,020 INFO    [rdr] addons/ocm-controller/fetch completed in 6.69 seconds
    2024-05-13 00:16:06,021 INFO    [rdr] Fetching finishied in 6.70 seconds
    2024-05-13 00:16:06,394 INFO    [rdr] addons/ocm-controller/fetch completed in 7.05 seconds
    2024-05-13 00:16:06,394 INFO    [rdr] Fetching finishied in 7.07 seconds

Fixes: RamenDR#1386
Signed-off-by: Nir Soffer <[email protected]>
The csi-hostpath-driver and volumesnapshots addons start much slower
with minikube 1.33. Replacing them with rook ceph rbd storage, the
kubevirt environments start up to 1.93 times faster.

Start times before and after this change:

| env          | local before | local after | lab before | lab after |
|--------------|--------------|-------------|------------|-----------|
| rdr-kubevirt |          600 |         475 |        920 |       603 |
| kubevirt     |          270 |         230 |        603 |       312 |

Signed-off-by: Nir Soffer <[email protected]>
It is easier to debug issues with a minimal environment. With
rook-cephfs and the required volumesnapshots minikube addon, the rook
environment is less minimal, but it is still quicker to start compared
with the full regional-dr environment.

Example run:

    $ drenv start envs/rook.yaml
    2024-05-12 21:44:00,426 INFO    [rook] Starting environment
    2024-05-12 21:44:00,483 INFO    [dr1] Starting minikube cluster
    2024-05-12 21:44:00,483 INFO    [dr2] Starting minikube cluster
    2024-05-12 21:44:38,650 INFO    [dr1] Cluster started in 38.17 seconds
    2024-05-12 21:44:39,090 INFO    [dr1/0] Running addons/rook-operator/start
    2024-05-12 21:44:39,090 INFO    [dr1/1] Running addons/csi-addons/start
    2024-05-12 21:44:59,732 INFO    [dr2] Cluster started in 59.25 seconds
    2024-05-12 21:45:00,218 INFO    [dr2/0] Running addons/rook-operator/start
    2024-05-12 21:45:00,218 INFO    [dr2/1] Running addons/csi-addons/start
    2024-05-12 21:45:08,913 INFO    [dr1/1] addons/csi-addons/start completed in 29.82 seconds
    2024-05-12 21:45:13,552 INFO    [dr1/0] addons/rook-operator/start completed in 34.46 seconds
    2024-05-12 21:45:13,552 INFO    [dr1/0] Running addons/rook-cluster/start
    2024-05-12 21:45:30,186 INFO    [dr2/1] addons/csi-addons/start completed in 29.97 seconds
    2024-05-12 21:45:41,444 INFO    [dr2/0] addons/rook-operator/start completed in 41.23 seconds
    2024-05-12 21:45:41,444 INFO    [dr2/0] Running addons/rook-cluster/start
    2024-05-12 21:46:21,806 INFO    [dr1/0] addons/rook-cluster/start completed in 68.25 seconds
    2024-05-12 21:46:21,806 INFO    [dr1/0] Running addons/rook-toolbox/start
    2024-05-12 21:46:25,669 INFO    [dr1/0] addons/rook-toolbox/start completed in 3.86 seconds
    2024-05-12 21:46:25,669 INFO    [dr1/0] Running addons/rook-pool/start
    2024-05-12 21:46:40,768 INFO    [dr1/0] addons/rook-pool/start completed in 15.10 seconds
    2024-05-12 21:46:40,768 INFO    [dr1/0] Running addons/rook-cephfs/start
    2024-05-12 21:47:01,116 INFO    [dr2/0] addons/rook-cluster/start completed in 79.67 seconds
    2024-05-12 21:47:01,116 INFO    [dr2/0] Running addons/rook-toolbox/start
    2024-05-12 21:47:01,689 INFO    [dr1/0] addons/rook-cephfs/start completed in 20.92 seconds
    2024-05-12 21:47:01,689 INFO    [dr1/0] Running addons/rook-cephfs/test
    2024-05-12 21:47:04,421 INFO    [dr2/0] addons/rook-toolbox/start completed in 3.31 seconds
    2024-05-12 21:47:04,421 INFO    [dr2/0] Running addons/rook-pool/start
    2024-05-12 21:47:08,994 INFO    [dr1/0] addons/rook-cephfs/test completed in 7.30 seconds
    2024-05-12 21:47:29,597 INFO    [dr2/0] addons/rook-pool/start completed in 25.18 seconds
    2024-05-12 21:47:29,597 INFO    [dr2/0] Running addons/rook-cephfs/start
    2024-05-12 21:47:44,236 INFO    [dr2/0] addons/rook-cephfs/start completed in 14.64 seconds
    2024-05-12 21:47:44,236 INFO    [dr2/0] Running addons/rook-cephfs/test
    2024-05-12 21:47:51,296 INFO    [dr2/0] addons/rook-cephfs/test completed in 7.06 seconds
    2024-05-12 21:47:51,296 INFO    [rook/0] Running addons/rbd-mirror/start
    2024-05-12 21:48:41,169 INFO    [rook/0] addons/rbd-mirror/start completed in 49.87 seconds
    2024-05-12 21:48:41,169 INFO    [rook/0] Running addons/rbd-mirror/test
    2024-05-12 21:48:52,317 INFO    [rook/0] addons/rbd-mirror/test completed in 11.15 seconds
    2024-05-12 21:48:52,317 INFO    [rook] Environment started in 291.89 seconds

Signed-off-by: Nir Soffer <[email protected]>
We added csi-hostpath-driver as a quick temporary solution until we have
cephfs storage. Now that we have it, we can replace it and enjoy reduced
start time, in particular with minikube 1.33.

To replace csi-hostpath-driver, we have to add cephfs to the volsync
development environment. This is slower locally, but faster in the e2e
lab. For regional-dr, this is always faster, up to 1.82 time faster in
the e2e lab.

The main difference is cluster start time - minikube addons are loaded
before minikube start returns.

Before:

    2024-05-12 23:01:42,844 INFO    [dr2] Cluster started in 433.20 seconds
    2024-05-12 23:02:07,215 INFO    [dr1] Cluster started in 457.57 seconds

After:

    2024-05-12 23:18:13,386 INFO    [hub] Cluster started in 71.87 seconds
    2024-05-12 23:18:46,943 INFO    [dr2] Cluster started in 105.43 seconds

Start time before and after this change:

| env          | local before | local after | lab before | lab after |
|--------------|--------------|-------------|------------|-----------|
| regional-dr  |          636 |         426 |        780 |       427 |
| volsync      |          261 |         352 |        520 |       395 |

Signed-off-by: Nir Soffer <[email protected]>
Signed-off-by: Raghavendra Talur <[email protected]>
Looks like recent change in pylint trigger this incorrect report:

    drenv/commands.py:234:28: E0606: Possibly using variable
    'input_view' before assignment (possibly-used-before-assignment)

This cannot happen since we don't register proc.stdin if input is None,
so when we reach this block input_view is assigned. However disabling
the check risk missing a real issue in that block.

Lets change the code so pylint can understand it better. This also make
it easier to understand for humans. The cost is negligible, adding 2
temporary variables even when they are never used.

Signed-off-by: Nir Soffer <[email protected]>
Signed-off-by: jacklu <[email protected]>
Minikube v1.33.1 includes the fixes we added recently for v1.33.0, so we
don't need to setup or clean up anything. We can remove the code and
require users and developer to upgrade to latest version, but it is
nicer to make this transparent and skip the unneeded configuration.

We can remove the special fixes for minikube 1.33.0 later maybe when
1.34 will be released.

Example run with minikube 1.33.1:

    $ drenv setup -v
    2024-05-18 00:19:54,127 INFO    [main] Setting up minikube for drenv
    2024-05-18 00:19:54,152 DEBUG   [minikube] Using minikube version 1.33.1
    2024-05-18 00:19:54,152 DEBUG   [minikube] Skipping sysctl configuration
    2024-05-18 00:19:54,153 DEBUG   [minikube] Skipping systemd-resolved configuration

Signed-off-by: Nir Soffer <[email protected]>
The note is correct but not helpful at this point. Let's drop
unnecessary details like we did for docs/user-quick-start.md.

Signed-off-by: Nir Soffer <[email protected]>
Signed-off-by: Abhijeet Shakya <[email protected]>
abhijeet219 and others added 14 commits May 20, 2024 16:08
This change will start using cache for kustomization resources, so
starting the addon can directly use the cached resources.

Changes:
- drenv fetch can be used to fetch resources anytime.
- Starting an addon will first try to fetch resources, then apply the
  fetched resources. If there is no change, fetch won't do anything,
  so takes very less time.

Fixes: RamenDR#1337
Signed-off-by: Abhijeet Shakya <[email protected]>
Since we upgraded, the e2e job is failing[1] (due to a bug in the e2e
integration, the job does not fail!). Lets try to go back to olm 0.22
since we know it worked before this change[2].

[1] last good build: https://github.com/RamenDR/ramen/actions/runs/9134579476/job/25120395289
[2] first bad build: https://github.com/RamenDR/ramen/actions/runs/9158274985/job/25177239838

Signed-off-by: Nir Soffer <[email protected]>
This is required because we can have two PVCs with the same name when
multinamespace support is enabled.

Signed-off-by: Raghavendra Talur <[email protected]>
Also, create the rd and rs in the same namespace as the PVC and not VRG.

Signed-off-by: Raghavendra Talur <[email protected]>
Signed-off-by: Raghavendra Talur <[email protected]>
Now that that we have basic test running, we want to fail the workflow
if the tests failed. Without this  people assumes code changes are
passed the tests.

Signed-off-by: Nir Soffer <[email protected]>
kubeObjectsRecoveryStartOrResume() error handling is very confusing -
the code tries to avoid duplicating error handling in 2 unrelated code
paths (ok=true, ok=false), leading to referencing a nil request when ok
is false.

We need to refactor this later, for now just skip cleanup if there is
nothing to cleanup.

Bug: https://bugzilla.redhat.com/2282284
Signed-off-by: Nir Soffer <[email protected]>
@df-build-team df-build-team requested a review from a team May 27, 2024 06:35
Copy link

openshift-ci bot commented May 27, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: df-build-team

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ShyamsundarR ShyamsundarR merged commit e0a2bfa into release-4.17 May 28, 2024
57 of 59 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants