-
Notifications
You must be signed in to change notification settings - Fork 640
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] Migration issues from ES to Cassandra #1340
Comments
This commit should fix your general problem and improve performance of migration. |
It keeps failing on RC3
|
The first error (User [email protected] not found) is partially fixed by #1374. |
Can you provide the same logs from /var/log/thehive/application.conf (with timing information) ? I'd like to know if the two errors are related. |
It was running for 4 days when the crash happened. It crashed frequently before RC3. I'm attaching the requested logs:
|
Unless you have several terabytes of data, the RC3 migration should not take 4 days. |
Thanks for the tip, I didn't know that I had to start from scratch. It's been running for 3 hours now. It started really fast but the performance is slower and slower as time passes by. Now, the speed is 3 seconds per observable. Still faster than the previous migration (8 seconds per observable) but I'm afraid that it will get slower. I'm not migrating terabytes, just about 20GB. Also, my host's specs are not bad at all. |
After 19 hours, 202 alerts and 11 cases migrated. The performance is bad at the moment so I'm resuming the previous migration. |
How many observables do you have ? |
I don't know exactly, but I think I have about 200K - 300K |
You can improve performance by adding application.conf:
Both settings should be set only for migration. |
So is it enough to create an additional superuser? Or do I have to create the Org that the database schema is prepared? |
I also already have a migration time more than 4 days, having usually "only" arround 30-40 Observables. Currently 1600 Cases of 7000 are imported. |
The version RC3 of migration increases migration speed if the database schema has been created by it (i.e. it is not a resume from previous migration version). |
Ok thanks :), So I' already using the new database schema. First it was much faster during migration, but later it got slower and slower. |
Do you know how to properly set this value?
|
@mamoedo In fact, you can't change |
@To-om I see that you made some changes to improve the migration. Is it going to be available soon on a RC so we all can speed up the migrations? |
Also, I've been testing performance and some alternatives to speed up the migration and I found that creating alerts with thehive4py 1.5.3, the performance problem seems to be on the observables upload. Is it checking if the observable already exists in cassandra before creating it? That could explain the performance degradation over time. A simple alert, with 7 observables, takes 1 minute.
Here you can see the script used:
|
Last week's speed was about 7-10 seconds per observable. Today it's taking 30 seconds per observable. I don't think this will finish anytime soon :( |
The next release will improve migration process (not yet committed). On my test database the observable creation go from 275ms (average over all the migration) to 7ms. The time becomes stable during the migration. |
@To-om Using this patches (yes really!) ;) , the import seems to still get slower and slower. Here some error log, it seems there is also a scroll-things with ES that makes issues:
|
I have the same error almost every week. Now it takes 7 hours to resume the migration and start importing observables after a crash. |
In order to increase write performance, I removed the indexes. But without an index, the search is very slow, that's why I keep the identifiers of certain objects (case model and tag for example) during the migration. When the migration resumes, the identifiers of the objects already migrated are not known and are searched in the database (warning "The request requires an iteration on all the vertices") I will prevent all searches during migration. I can create a new object instead of retrieving the existing one. Then the migration process will remove duplicates when the indexes are operational. |
Using the new 4.0.0 release (https://download.thehive-project.org/thehive4-4.0.0-1.zip) I'm also experiencing performance downgrade since the start from scratch of the migration. It has been 4 days since it started and it's not over yet. Here are some logs showing the performance issue, specially on alerts creation: cat migration.log | cut -d " " -f11 | grep ms
...
...
|
Migration finished after 12 days (too much compared to other migrations with less data #1465 ). But it showed some strange errors at the end. Is it something to worry about, @To-om? Also, do you know a way to remove the audits before migrating? They seem to be the biggest and less important thing for me to migrate
|
Also, does anybody know how to use the --case-from-date param in the migration tool? It fails:
And I tried both with and without quotes |
Request Type
Bug
Work Environment
Problem Description
I have several problems with the migration.
I haven't done any modification in elasticsearch, so this is a simple TheHive system that has just performed several updates after migrations from version 2 up to latest 3 version.
Except the issue, that the migration is very slow (see other issue, this are 250 MB) , there are several failures.
After 4 days with just some demo data migration stopped with this error message:
During import several times errors like this appeared:
Can you please help here?
The text was updated successfully, but these errors were encountered: