You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've been meaning to provide feedback on the upgrade functionality for quite some time, but life have gotten in the way. Maybe this should have been multiple issues, and I've might have missed some details or points, but it is what is it.
Latest elastic docs and the implemented process have some differences:
Stopping ML nodes is not implemented.
Docs says "cluster.routing.allocation.enable": "primaries" vs implemented none
Wait period for a cluster to return to Green status is not always long enough
Sometimes cluster never returns to Green status as there are no eligible nodes for the replica shards
If a node fails, the entire play should abort. Currently it just drops the node that failed, and keeps running for the rest of the nodes.
Questions:
Is the "cluster.routing.allocation.enable" based on earlier recommendations, or is there another reason to choose none over primaries?
My biggest blocker currently is that the cluster remains in a yellow state when there are replicas with no eligible nodes. The Docs says to proceed with the upgrade in these cases. This means we would have to check init and relo columns in _cat/health?v=true. This might either be trivial or far-from-trivial, not sure to be honest.
Regarding failing entire play vs node, this might be something in my ansible setup, or something in my playbook. I've not had time to give this a hard look yet.
Adding a task to start/stop ML nodes should be trivial, I might drop a PR for this if/when I find the time.
The text was updated successfully, but these errors were encountered:
As you know from our collaboration "life getting in the way" Is a thing just know all too well. Your contributions are very welcome and I'm personally extremely thankful for the work you put into it.
To be honest, the "old implementation" came from a very old documentation where updates where done mostly manually. Until now we have not encountered a problem with the update procedure. That doesn't mean I don't believe it exists, we only work with a limited count of different setups.
Please provide PRs against #349 as this will be the new way for updates/upgrades.
I haven't encountered an upgrade where the system was left without eligible nodes. My personal approach would be that this is an fatal exception and needs manual interaction. On the other hand with your code you provided I don't see a problem with using it in the future. All the more since you can support the idea of proceeding with official documentation.
Ask a question
Hi
I've been meaning to provide feedback on the upgrade functionality for quite some time, but life have gotten in the way. Maybe this should have been multiple issues, and I've might have missed some details or points, but it is what is it.
Latest elastic docs and the implemented process have some differences:
"cluster.routing.allocation.enable": "primaries"
vs implementednone
Things I have observed during testing:
Questions:
Is the
"cluster.routing.allocation.enable"
based on earlier recommendations, or is there another reason to choosenone
overprimaries
?My biggest blocker currently is that the cluster remains in a yellow state when there are replicas with no eligible nodes. The Docs says to proceed with the upgrade in these cases. This means we would have to check
init
andrelo
columns in_cat/health?v=true
. This might either be trivial or far-from-trivial, not sure to be honest.Regarding failing entire play vs node, this might be something in my ansible setup, or something in my playbook. I've not had time to give this a hard look yet.
Adding a task to start/stop ML nodes should be trivial, I might drop a PR for this if/when I find the time.
The text was updated successfully, but these errors were encountered: